Twitter scraping of older tweets

2024/11/15 6:01:49

I am doing a project in which I needed to get tweets from twitter, and I used the twitter API but it only gives tweets from 7-9 days old but I want a few months older tweets as well. So I decided to scrape Twitter using Beautifulsoup and later selenium, but when parsing it is not returning the elements but rather the veiwsource of the entire webpage. Please help!!

import requests
from bs4 import Beautifulsoup
f=requests.get("https://twitter.com/search?q=%23......%20until%3A2020-02-07%20since%3A2020-01-01&src=typed_query").text
soup = BeautifulSoup(f,'html.parser')print(soup)name = soup.find_all('span', class_="css-901oao css-16my406 r-1qd0xha r-ad9z0x r-bcqeeo r-qvutc0")print(name)

the output from printing soup....i don't how to say it but its the viewsource but not the actual html code

{"undefined"!=typeof Symbol&&Symbol.toStringTag&&Object.defineProperty(e,Symbol.toStringTag,{value:"Module"}),Object.defineProperty(e,"__esModule",{value:!0})},t.t=function(e,n){if(1&n&&(e=t(e)),8&n)return e;if(4&n&&"object"==typeof e&&e&&e.__esModule)return e;var d=Object.create(null);if(t.r(d),Object.defineProperty(d,"default",{enumerable:!0,value:e}),2&n&&"string"!=typeof e)for(var o in e)t.d(d,o,function(n){return e[n]}.bind(null,o));return d},t.n=function(e){var n=e&&e.__esModule?function(){return e.default}:function(){return e};return t.d(n,"a",n),n},t.o=function(e,n){return Object.prototype.hasOwnProperty.call(e,n)},t.p="https://abs.twimg.com/responsive-web/web/",t.oe=function(e){throw e};var i=window.webpackJsonp=window.webpackJsonp||[],c=i.push.bind(i);i.push=n,i=i.slice();for(var l=0;l<i.length;l++)n(i[l]);var u=c;d()}([]),window.__SCRIPTS_LOADED__.runtime=!0;
//# sourceMappingURL=runtime.cc3200a4.js.map

Selenium output in the same as well

from selenium import webdriver
PATH = "C:\\Program Files\\chromedriver.exe"
driver = webdriver.Chrome(PATH) 
driver.get("https://twitter.com")email = driver.find_element_by_name('session[username_or_email]')
password = driver.find_element_by_name('session[password]')email.send_keys('......')
password.send_keys("......")
password.send_keys(Keys.RETURN)
time.sleep(1)driver.get('https://twitter.com/search?q=%23....%20until%3A2020-02-07%20since%3A2020-01-01&src=typed_query')
time.sleep(1)print(driver.page_source)
Answer

GetOldTweets3 enables you to extract historical tweets and filter based on multiple criteria i.e. time frame, location, handle, or search query without any API key prerequisites.

E.g.

  import GetOldTweets3 as got# Tweet paramssearch_term = 'china trade war'start_date = '2017-01-01'end_date = '2020-01-01'# Define historical tweets criteriatweet_criteria = got.manager.TweetCriteria().setUsername('reuters') \.setQuerySearch(search_term) \.setSince(start_date) \.setUntil(end_date) \# Return tweets based on tweet criteriatweets = got.manager.TweetManager.getTweets(tweet_criteria)tweets.text

Note that you can access further tweet attributes such as hashtags, retweets etc through the tweet variable, for example:

other_tweet_attributes = [[tweet.username, tweet.hashtags for tweet in tweets]]
https://en.xdnf.cn/q/119344.html

Related Q&A

Bootstrap Navbar Logo not found

Hello I am trying to get my NavBar on bootstrap to show a Logo, I have tried moving the png to different folders in the app but I get this error: System check identified no issues (0 silenced). January…

Why camelcase not installed?

i try to install camelcase in my python project. pip install camelcase but when i want to use the package, pylance give me this error: Import "camelcase" could not be resolved Pylance (report…

Find the two longest strings from a list || or the second longest list in PYTHON

Id like to know how i can find the two longest strings from a list(array) of strings or how to find the second longest string from a list. thanks

Which tensorflow-gpu version is compatible with Python 3.7.3

Actually, I am tired of getting "ImportError: DLL load failed" inWindows 10 CUDA Toolkit 10.0 (Sept 2018) Download cuDNN v7.6.0 (May 20, 2019) / v7.6.4 tensorflow-gpu==1.13.1 / 1.13.2 / 1.14 …

Find valid strings [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.Want to improve this question? Update the question so it focuses on one problem only by editing this post.Closed 6…

What is the difference in *args, **kwargs vs calling with tuple and dict? [duplicate]

This question already has answers here:What does ** (double star/asterisk) and * (star/asterisk) do for parameters?(28 answers)Closed 8 years ago.This is a basic question. Is there a difference in doi…

Get result from multiprocessing process

I want to know if is there a way to make multiprocessing working in this code. What should I change or if there exist other function in multiprocessing that will allow me to do that operation.You can c…

How to do a second interpolation in python

I did my first interpolation with numpy.polyfit() and numpy.polyval() for 50 longitude values for a full satellite orbit.Now, I just want to look at a window of 0-4.5 degrees longitude and do a second …

How can I filter an ms-access databse, using QSqlTableModel and QLineEdit?

Im building a GUI that allows users to search information in a ms access database (yup. It has to be the ms access) The user has a textfield where he can type his search and the Tableview should update…

Python regex - Replace single quotes and brackets

Id like to replace quantities with name then a square bracket and a single quote with the contents inside. So, from this: RSQ(name[BAKD DK], name[A DKJ])to this:RSQ(BAKD DK, A DKJ)