filtering of tweets received from statuses/filter (streaming API)

2024/10/1 15:18:28

I have N different keywords that i am tracking (for sake of simplicity, let N=3). So in GET statuses/filter, I will give 3 keywords in the "track" argument.

Now the tweets that i will be receiving can be from ANY of the 3 keywords that i mentioned. The problem is that i want to resolve as to which tweet corresponds to which keyword. i.e. mapping between tweets and the keyword(s) (that are mentioned in the "track" argument).

Apparently, there is no way to do this without doing any processing on the tweets received.

So i was wondering what is the best way to do this processing? Search for keywords in the text of the tweet? What about case-insensitive? What about when multiple words are there in same keyword, e.g: "Katrina Kaif" ?

I am currently trying to formulate some regular expression...

I was thinking the BEST way would to use the same logic (regular expressions etc.) as is used originally be statuses/filter API. How to know what logic is used by Twitter API statuses/filter itself to match tweets to the keywords ?

Advice? Help?

P.S.: I am using Python, Tweepy, Regex, MongoDb/Apache S4 (for distributed computing)

Answer

The first thing coming into my mind is to create a separate stream for every keyword and start it in a separate thread, like this:

from threading import Thread
import tweepyclass StreamListener(tweepy.StreamListener):def __init__(self, keyword, api=None):super(StreamListener, self).__init__(api)self.keyword = keyworddef on_status(self, tweet):print 'Ran on_status'def on_error(self, status_code):print 'Error: ' + repr(status_code)return Falsedef on_data(self, data):print self.keyword, dataprint 'Ok, this is actually running'def start_stream(auth, track):tweepy.Stream(auth=auth, listener=StreamListener(track)).filter(track=[track])auth = tweepy.OAuthHandler(<consumer_key>, <consumer_secret>)
auth.set_access_token(<key>, <secret>)track = ['obama', 'cats', 'python']
for item in track:thread = Thread(target=start_stream, args=(auth, item))thread.start()

If you still want to distinguish tweets by keywords by yourself in a single stream, here's some info on how twitter uses track request parameter. There are some edge cases that could cause problems.

Hope that helps.

https://en.xdnf.cn/q/70949.html

Related Q&A

UndefinedError: current_user is undefined

I have a app with flask which works before But Now I use Blueprint in it and try to run it but got the error so i wonder that is the problem Blueprint that g.user Not working? and how can I fix it Thn…

Scipy filter with multi-dimensional (or non-scalar) output

Is there a filter similar to ndimages generic_filter that supports vector output? I did not manage to make scipy.ndimage.filters.generic_filter return more than a scalar. Uncomment the line in the cod…

How do I stop execution inside exec command in Python 3?

I have a following code:code = """ print("foo")if True: returnprint("bar") """exec(code) print(This should still be executed)If I run it I get:Tracebac…

sqlalchemy concurrency update issue

I have a table, jobs, with fields id, rank, and datetime started in a MySQL InnoDB database. Each time a process gets a job, it "checks out" that job be marking it started, so that no other p…

Python matplotlib: Change axis labels/legend from bold to regular weight

Im trying to make some publication-quality plots, but I have encountered a small problem. It seems by default that matplotlib axis labels and legend entries are weighted heavier than the axis tick mark…

Negative extra_requires in Python setup.py

Id like to make a Python package that installs a dependency by default unless the user specially signals they do not want that.Example:pip install package[no-django]Does current pip and setup.py mechan…

Get Type in Robot Framework

Could you tell me about how to get the variable type in Robot Framework.${ABC} Set Variable Test ${XYZ} Set Variable 1233Remark: Get the variable Type such as string, intget ${ABC} type = strin…

Keras 2, TypeError: cant pickle _thread.lock objects

I am using Keras to create an ANN and do a grid search on the network. I have encountered the following error while running the code below:model = KerasClassifier(build_fn=create_model(input_dim), verb…

How to remove minimize/maximize buttons while preserving the icon?

Is it possible to display the icon for my toplevel and root window after removing the minimize and maximize buttons? I tried using -toolwindow but the icon cant be displayed afterwards. Is there anoth…

Undefined reference to `PyString_FromString

I have this C code:... [SNIP] ... for(Node = Plugin.Head; Node != NULL; Node = Node->Next) {//Create new python sub-interpreterNode->Interpreter = Py_NewInterpreter();if(Node->Interpreter == N…