Twitter is a popular social network where users can share short SMS-like messages called tweets. Users share thoughts, links and pictures on Twitter, journalists comment on live events, companies promote products and engage with customers. The list of different ways to use Twitter could be really long, and with 500 millions of tweets per day, there’s a lot of data to analyse and to play with.
This is the first in a series of articles dedicated to mining data on Twitter using Python. In this first part, we’ll see different options to collect data from Twitter. Once we have built a data set, in the next episodes we’ll discuss some interesting data applications.
Update July 2016: my new book on data mining for Social Media is out! Part of the content in this tutorial has been improved and expanded as part of the book, so please have a look. Chapter 2 about mining Twitter is available as a free sample from the publisher’s web site, and the companion code with many more examples is available on my GitHub
Table of Contents of this tutorial:
- Part 1: Collecting Data (this article)
- Part 2: Text Pre-processing
- Part 3: Term Frequencies
- Part 4: Rugby and Term Co-Occurrences
- Part 5: Data Visualisation Basics
- Part 6: Sentiment Analysis Basics
- Part 7: Geolocation and Interactive Maps
More updates: fixed version number of Tweepy to avoid problem with Python 3; fixed discussion on _json to get the JSON representation of a tweet; added example of process_or_store().
Register Your App
In order to have access to Twitter data programmatically, we need to create an app that interacts with the Twitter API.
The first step is the registration of your app. In particular, you need to point your browser to http://apps.twitter.com, log-in to Twitter (if you’re not already logged in) and register a new application. You can now choose a name and a description for your app (for example “Mining Demo” or similar). You will receive a consumer key and a consumer secret: these are application settings that should always be kept private. From the configuration page of your app, you can also require an access token and an access token secret. Similarly to the consumer keys, these strings must also be kept private: they provide the application access to Twitter on behalf of your account. The default permissions are read-only, which is all we need in our case, but if you decide to change your permission to provide writing features in your app, you must negotiate a new access token.
Important Note: there are rate limits in the use of the Twitter API, as well as limitations in case you want to provide a downloadable data-set, see:
- https://dev.twitter.com/overview/terms/agreement-and-policy
- https://dev.twitter.com/rest/public/rate-limiting
Accessing the Data
Twitter provides REST APIs you can use to interact with their service. There is also a bunch of Python-based clients out there that we can use without re-inventing the wheel. In particular, Tweepy in one of the most interesting and straightforward to use, so let’s install it:
pip install tweepy==3.3.0
Update: the release 3.4.0 of Tweepy has introduced a problem with Python 3, currently fixed on github but not yet available with pip, for this reason we’re using version 3.3.0 until a new release is available.
More Updates: the release 3.5.0 of Tweepy, already available via pip, seems to solve the problem with Python 3 mentioned above.
In order to authorise our app to access Twitter on our behalf, we need to use the OAuth interface:
import tweepy from tweepy import OAuthHandler consumer_key = 'YOUR-CONSUMER-KEY' consumer_secret = 'YOUR-CONSUMER-SECRET' access_token = 'YOUR-ACCESS-TOKEN' access_secret = 'YOUR-ACCESS-SECRET' auth = OAuthHandler(consumer_key, consumer_secret) auth.set_access_token(access_token, access_secret) api = tweepy.API(auth)
The api variable is now our entry point for most of the operations we can perform with Twitter.
For example, we can read our own timeline (i.e. our Twitter homepage) with:
for status in tweepy.Cursor(api.home_timeline).items(10): # Process a single status print(status.text)
Tweepy provides the convenient Cursor interface to iterate through different types of objects. In the example above we’re using 10 to limit the number of tweets we’re reading, but we can of course access more. The status variable is an instance of the Status() class, a nice wrapper to access the data. The JSON response from the Twitter API is available in the attribute _json (with a leading underscore), which is not the raw JSON string, but a dictionary.
So the code above can be re-written to process/store the JSON:
for status in tweepy.Cursor(api.home_timeline).items(10): # Process a single status process_or_store(status._json)
What if we want to have a list of all our followers? There you go:
for friend in tweepy.Cursor(api.friends).items(): process_or_store(friend._json)
And how about a list of all our tweets? Simple:
for tweet in tweepy.Cursor(api.user_timeline).items(): process_or_store(tweet._json)
In this way we can easily collect tweets (and more) and store them in the original JSON format, fairly easy to convert into different data models depending on our storage (many NoSQL technologies provide some bulk import feature).
The function process_or_store() is a place-holder for your custom implementation. In the simplest form, you could just print out the JSON, one tweet per line:
def process_or_store(tweet): print(json.dumps(tweet))
Streaming
In case we want to “keep the connection open”, and gather all the upcoming tweets about a particular event, the streaming API is what we need. We need to extend the StreamListener() to customise the way we process the incoming data. A working example that gathers all the new tweets with the #python hashtag:
from tweepy import Stream from tweepy.streaming import StreamListener class MyListener(StreamListener): def on_data(self, data): try: with open('python.json', 'a') as f: f.write(data) return True except BaseException as e: print("Error on_data: %s" % str(e)) return True def on_error(self, status): print(status) return True twitter_stream = Stream(auth, MyListener()) twitter_stream.filter(track=['#python'])
Depending on the search term, we can gather tons of tweets within a few minutes. This is especially true for live events with a world-wide coverage (World Cups, Super Bowls, Academy Awards, you name it), so keep an eye on the JSON file to understand how fast it grows and consider how many tweets you might need for your tests. The above script will save each tweet on a new line, so you can use the command wc -l python.json from a Unix shell to know how many tweets you’ve gathered.
You can see a minimal working example of the Twitter Stream API in the following Gist:
Summary
We have introduced tweepy as a tool to access Twitter data in a fairly easy way with Python. There are different types of data we can collect, with the obvious focus on the “tweet” object.
Once we have collected some data, the possibilities in terms of analytics applications are endless. In the next episodes, we’ll discuss some options.
Table of Contents of this tutorial:
Very nice post, as usual. I am actually using part of this code to build a small application to download all the images of a twitter account. I only one small comment, for the sake of completion, I believe you should also import json in your example.
LikeLiked by 1 person
Reblogged this on [kingkonganalytics] and commented:
An alternative way to Mine Twitter Data using Python instead of R … Great Tutorial for anyone familiar with Python, and webscraping in particular. Good job, Marco Bonzanini.
LikeLike
Reblogged this on Oleg Baskov.
LikeLike
Can you help me with this error: “NameError: name ‘process_or_store’ is not defined”
LikeLike
Hi Bozhidar,
the function is not defined in the code sample (hence the error), it’s just a placeholder for you to personalise depending on your needs (storing the data, do some pre-processing as described in Part 2, etc.). In the simplest form, you can just substitute it with a print() and dump all the JSON in a plain file
LikeLike
can you please explain more?
LikeLike
I tried to solve it this way and i get this error: AttributeError: ‘Status’ object has no attribute ‘json’
can you help me!
LikeLike
Hi Afnan, You should use the _json attribute (with a leading underscore)
LikeLike
huh. I got stuck there for a while
LikeLike
I tried to dump it with the command print() but it didn’t give me a meaningful answer. could you show me the way how to store the JSON in a simple file?
LikeLike
That’s just a placeholder of what you want to do.Ex:you can use ‘print’,to print line by line.
LikeLike
Reblogged this on (b)log.
LikeLike
Am working on a project using Python & Twitter and I chanced upon your site! Really cool & very useful! My interests lie in data science too, so I’ll be back :)
LikeLike
Hi Marco,
Really great intro. Don’t want to take up your time debugging, but was wondering if you know how to add a timer to the data collection process. I tried the following….
if (time.time() – start_time)/60 <= 1:
twitter_stream = Stream(auth, MyListener())
twitter_stream.filter(track=['#python'])
else:
print("—collection terminated at %s seconds —") %round((time.time() -start_time),2)
import sys
sys.exit()
It runs, but doesn't stop after 1 minute like intended. Any ideas?
LikeLike
hi basilspike,
with the if/else described in that way, the else branch is never executed (so the script doesn’t stop).
One option is to check the timer in the on_data() method, for example before saving/printing the tweets you could do:
if (time.time() – self.start_time)/60 >= 1:
return False
You need to set self.start_time in the __init__ method. Please notice that in this case the script won’t stop after one minute, but only when you receive the first tweet that arrives after the one-minute timeout (could be a big difference for low volume streams).
Another option would be to use a subprocess for the download, and terminate it after the timeout is gone.
LikeLike
Hey !
This is the error I’m encountering on running the StreamListener code:
C:\Python27\lib\site-packages\requests-2.7.0-py2.7.egg\requests\packages\urllib3\util\ssl_.py:90: InsecurePlatformWarning: A true SSLContext object is not available. This prevents urllib3 from configuring SSL appropriately and may cause certain SSL connections to fail. For more information, see https://urllib3.readthedocs.org/en/latest/security.html#insecureplatformwarning.
InsecurePlatformWarning
C:\Python27\lib\site-packages\requests-2.7.0-py2.7.egg\requests\packages\urllib3\connection.py:251: SecurityWarning: Certificate has no `subjectAltName`, falling back to check for a `commonName` for now. This feature is being removed by major browsers and deprecated by RFC 2818. (See https://github.com/shazow/urllib3/issues/497 for details.)
SecurityWarning
What does this error signify? And how can I debug this error?
LikeLike
Hi Manas,
probably you’ll need to upgrade to a newer Python version, please have a look at the two links in the warning message you posted (the first one in particular)
LikeLike
Hey Marco, thanks a lot for this tutorial. How do I delete the lines from the json file after “n” minutes and keep streaming. I want to do this in time windows. For example:
1. Start the stream.
2. After 10 minutes, delete the lines from the json file.
3. Start writing again the tweets.
4. After 10 minutes, delete the lines again.
5. Start writing again the tweets.
6. Repeat 2 and 3 in kind of loop (I don’t know how to call it)
I want to do this because I want to clean my json file after an amount of time.
Ps: Sorry for my english. I’m just learning.
LikeLike
Hi Marco,
I got this error when streaming:
Can’t convert ‘bytes’ object to str implicitly
Any idea? Thanks!
LikeLike
Hi Catalin
Change the following lines in streaming.py
l161: self._buffer += self._stream.read(read_len).decode(‘ascii’)
l171: self._buffer += self._stream.read(self._chunk_size).decode(‘ascii’)
This fixes it for python3.4, I do not know about python2.7
LikeLike
Hi Catalin, hi che and thanks for your input.
To expand on this, the problem is introduced with version 3.4.0 of tweepy for Python 3. Version 3.3.0 of tweepy, the one used when this article was written, is immune from this problem on Python 3.
The issue is open on github: https://github.com/tweepy/tweepy/issues/615
The workaround suggested by che works for Python 3, but if you don’t want to tamper with the libraries you could also simply downgrade tweepy, e.g.
pip install -I tweepy==3.3.0
LikeLike
Thanks Marco for your twitter answer. However, I am new in Python, and I would need a bit more of information. For now I am trying different packages, but I have difficulties making them work. I want to ask you what I need to run code such as Tweepy. I have Anaconda, Pycharm, text wrangler but I cannot find the way to run the code. I have installed tweetpy through the terminal, and when I try to use it in terminal, pycharm or anaconda, it gives different errors. Probably because python is not well set up, I am not using IDE properly, or any other beginner’ reason
LikeLike
Hi Javier, thanks for following up here.
So from the command line, firstly check that you can invoke the python interpreter correctly, e.g. typing “python –version” (it should give you 3.4.3 if you have the latest anaconda for python 3).
Anaconda doesn’t come with a package for tweepy, so if you try “conda install tweepy” you’ll see an error. You can anyway install it with “pip install tweepy==3.3.0”.
I’ve updated the post with a small example of how to use the stream API posted on github: https://gist.github.com/bonzanini/af0463b927433c73784d
so you can just save the file and run it (assuming you have the right credentials/tokens as explained in the article)
LikeLike
Hey, thank you for your effort. The output keeps giving me “401” ? What would it be?
LikeLike
Hi, 401 is the status code for “unauthorized”. Usually this problem is related to either missing or incorrect credentials (the OAuth part at the beginning of the article), so that’s where to look first.
LikeLike
Hey Marco, thank you for your replay
The problem is when I executed the previous commands such as user_timeline, friends, and home_timeline .. those are working well and give me right output using the same credentials. When I use it in the streaming part, it give me 401?
LikeLike
Hi Marco – I’m rather new at working with API’s, and your rundown of how to access Twitter data is of great help. Thanks!
I have one question though. It seems to me that the data is already available in JSON. By making use of the ._json key that is directly available through Tweepy, you don’t have to define and apply the parse method:
from pprint import pprint
for status in tweepy.Cursor(api.user_timeline).items():
… pprint (status._json)
Correct me if I’m wrong – I’m by no means an expert in this field.
LikeLike
Hi Nicolai, that’s correct and it’s something I have to fix in the article, it’s been in the pipeline for a while. It’s worth mentioning that the _json attribute is a dict, not the raw JSON string. Thanks for your input.
Cheers
Marco
LikeLike
Hi Macro,
Thanks for the great article. But, I see that place field has null values. I also want to fetch the country or location fom where the text is being tweeted. Pls help.
LikeLike
Hi Jigar, places and coordinates are given optionally by the users and are often omitted. Unfortunately, only a small share of tweets has these data set explicitly. Probably you’ll need to collect more data to see something interesting.
Cheers,
Marco
LikeLike
Hi Marco
When I wrote the first command line “import weepy” from python, I got the error
Traceback (most recent call last):
File “”, line 1, in
ImportError: No module named ‘tweepy’
I upgraded to python3 and it still gives same error, any clue why this happens?
Thanks
Deena
LikeLike
This is what I got when I was downloading tweepy thru sudo pip install tweepy
Downloading/unpacking tweepy
Downloading tweepy-3.4.0.tar.gz
Running setup.py egg_info for package tweepy
Traceback (most recent call last):
File “”, line 14, in
File “/Users/deena/build/tweepy/setup.py”, line 17, in
install_reqs = parse_requirements(‘requirements.txt’, session=uuid.uuid1())
TypeError: parse_requirements() got an unexpected keyword argument ‘session’
Complete output from command python setup.py egg_info:
Traceback (most recent call last):
File “”, line 14, in
File “/Users/deena/build/tweepy/setup.py”, line 17, in
install_reqs = parse_requirements(‘requirements.txt’, session=uuid.uuid1())
TypeError: parse_requirements() got an unexpected keyword argument ‘session’
LikeLike
Hi Marco
I tried to write the command import tweepy from python 2.7 but it gave me the error
Traceback (most recent call last):
File “”, line 1, in
ImportError: No module named ‘tweepy’
I upgraded to python3, still same error. Any clue why this happens?
Thanks
Deena
LikeLike
Hi deena, the examples are tested on Python 3. You should be able to install the correct version on Tweepy with:
pip install tweepy==3.3.0
(I’ve updated the article with an explanation on the version numbers)
In general it’s more sensible to use a virtualenv rather than sudo
LikeLike
hello everyone, i have to go through the process of collecting tweets for a certain period (1/2015-9/2015) for my thesis, reading the article i started to register and create a new app but i have no website and the field is required. any suggestions?
thank you in advance
LikeLike
Hi Christina, the website is required but you can put a placeholder (as suggested also by Twitter). For example your twitter handle, or your github page, will do
LikeLike
thank you very much Marco
LikeLike
Marco another question,
i tried to install tweepy. i got stuck in cmd when i type python setup.py install i get this: AttributeError: ‘str’ object has no attribute ‘req’.
LikeLike
Hi, I’d recommend to use pip with a virtualenv to install the libraries, rather than using the bleeding-edge version from github — and keep in mind that the code in these articles has been tested on Python 3.4
Cheers, Marco
LikeLike
thank you very much Marco, your answers are very helpfull. Another mabe silly question. can i run the code with python IDLE? Im at the point where i start to access the data with the OAuth interface
LikeLike
You can of course run the code from an interactive environment, although the sample script (https://gist.github.com/bonzanini/af0463b927433c73784d) is meant to be run from the command line. I think the script is more convenient for downloading the data initially, then you can do the data analysis interactively from IDLE/REPL if you wish
LikeLike
thank you very much Marco, everything worked just fine!
LikeLike
hello everyone! i would like to ask why when retreiving tweets i have some repeated. its like the api returns a tweet more than one time. how can i fix that?
thank you in advance
LikeLike
what is the difference between “def on_data ” and “def on_status ” ??
LikeLike
Hi, on_data() is the entry point for any sort of data received, on_status() is specific for receiving statuses. You can implement directly the on_status method if you prefer.
Cheers,
Marco
LikeLike
How would you store all your tweets in a separate json file instead of just printing them all out like you show in the process_or_store function? I see in later posts you reference the most common words in your tweets and I wanted to know how to do that.
LikeLike
Hi Aidan, similarly to the on_data() method in the streaming example, you can open the file and then just dump the json in it, something similar to:
(even without defining a custom process_or_store() here)
Cheers,
Marco
LikeLike
Hey thanks for the help!
When I run your suggested code I’m getting an error:
TypeError: unsupported operand type(s) for +: ‘dict’ and ‘str’
Something to do with the +”\n” part I think, but I can’t figure out how to make it work and an internet search hasn’t helped much.
Any further advice you can give me would be hugely appreciated
LikeLike
I’ve updated the example: tweet._json is in fact a dictionary (loaded from the original json string, but still not a string), so you need json.dumps() first
Cheers,
Marco
LikeLike
Thank you for the help.
If I were trying to do the same thing but get all the tweets I have favorited in a json file, would the the code structure be the same?
I tried with api.favorites instead of api.user_timeline, but I get an error code 429 which means too many requests.
I guess what I’m asking is is there a way to get around the rate limit in this case and what would you suggest?
Thanks again,
Aidan
LikeLike
Hi Aidan, when you’re hitting the rate limits, you need to slot a time.sleep() in the for loop, with the appropriate number of seconds. Details on rate limiting: https://dev.twitter.com/rest/public/rate-limiting
LikeLike
Marco,
Great page. Very helpful. I am a beginner who is attempting to use your guide to locate the geographical origin of certain hashtags.
For the process_or_store() function, how do I tweak its definition so that it saves the stream of tweets I collect as a json file (such as the ‘mytweets’ one you refer to in the second part of this guide).
Thanks in advance.
LikeLike
Hi Michael, thanks for the comment.
when downloading your tweets, you can for example do something like:
(here you don’t even need a custom process_or_store() )
For the streaming part, for example to download tweets with a given hashtag, the MyListener class defined in the article already stores the tweets on a file.
Also, please have a look at this: https://marcobonzanini.com/2016/08/02/mastering-social-media-mining-with-python/ (chapter 2, about mining Twitter, is available as a free sample from the publisher’s web site, and more sample code is available on my Github).
Cheers,
Marco
LikeLiked by 1 person
Marco,
I’ll give that a go now. Thanks for your help.
LikeLike
Marco,
Is there an easy way to put a time limit on the stream of data?
I was thinking of something like
import time
start_time =time.time()
end_time = start_time + 10
…
if end_time < time.time():
sys.exit()
So my code looks like this:
import time
..
start_time =time.time()
end_time = start_time + 10
from tweepy import Stream
from tweepy.streaming import StreamListener
class MyListener(StreamListener):
def on_data(self, data):
start = time.time()
try:
with open('fresh2.json', 'a') as f:
f.write(data)
return True
except BaseException as e:
print('Error on_data', str(e))
return True
finally:
if end_time < time.time():
sys.exit()
But this does not work? Thanks in advance. Excellent page.
LikeLike
Hi Peter,
off the top of my head, the simplest option if you want to achieve this programmatically, you could set the desired end-time in the constructor of the custom listener, and then check for the timing when you received the data e.g.
(I haven’t tested this code, might need some tweaking)
one problem with this approach is that the check is triggered only if you receive data, so for a very low-volume stream you might keep the connection going for more than 10 seconds.
Another option is to use some OS facility, e.g. Linux has a timeout command (also available on mac via “brew install coreutils”, in this case called gtimeout):
this will run the python script and kill it after 10 seconds.
Cheers,
Marco
LikeLike
Hi Marco
I bought your “Mastering Social Media Mining with Python” book, and I’m trying to get the accompanying code on Github to work. I’m at twitter_streaming, which I haven’t been able to get working.
Following the instructions in the book, I did the following
– created a virtual environment
– set the four parameters (customer key and secret, access token and secret) as environment variables
– ran the twitter_client.py script (for authentication) from windows cmd prompt as follows
$ python twitter_client.py
– ran the twitter_streaming.py script from windows cmd prompt with the keywords as follows
$ python twitter_streaming.py keyword1 keyword2 keyword 3
I’m getting 401 errors on running twitter_streaming.py. I tried to abort/exit the execution using Ctrl+C but no success – I’m unable to abort/exit the execution, and i’m getting a continuous stream of 401 errors.
My questions
– how to make sure that twitter_client.py was successfully executed?
– why do you think I’m getting 401 errors on executing twitter_streaming.py? and how can I abort/exit the execution to get to the prompt?
– is there a way to create the virtual environment and set the virtual environment variables and execute the accompanying scripts from inside the python interpreter rather than from the cmd prompt?
Note:
– I’ve been using python for about a year now, but still consider myself a beginner.
Thank you in advance
Ahmed
LikeLike
Hi Ahmed
error 401 is usually given because the credentials are incorrect, so possibly a copy-paste mismatch? If the variables are not set at all for the current session, the script would raise an error and quit.
You don’t need to call twitter_client.py explicitly, because it’s used by the other scripts to set up the authentication.If you set the environment variables from the command line, keep in mind that these variables are scoped in your existing session, i.e. when you close the console window they are not kept. I usually put all the environment setting commands in a single shell script that I run once per session (I also make sure that it’s ignored by the source control and not checked in). You can check the value of these variables using “echo %VAR_NAME%”, for example echo %TWITTER_CONSUMER_KEY%
If you’re not comfortable with defining the environment in this way, you can still hard-code the values, e.g. in twitter_client.py you can replace the get_twitter_auth() function:
This is not a great design as explained in Chapter 2 (separation of concerns between app logic and config), but it’s good enough to get you started, just make sure that you don’t accidentally push your personal keys to github :)
Cheers,
Marco
LikeLike
How to convert the JSON-dictionary to CSV?
LikeLike
Hi
JSON can represent a nested structure, while CSV only a flat record-like one, so in the general case you can’t directly map JSON to CSV. You first need to find a way to normalise the JSON data. Once you ensure your data structure is flat, you can use the Python csv module (csv.writer in particular) to produce the CSV output.
Cheers,
Marco
LikeLike
Thanks for the great tutorial! I’m new to Python, and was stuck trying to parse the JSON data.
LikeLike
I am trying to gather all the tweets about a particular event through the streaming API. I am able to get data for #apple but when I am trying to stream data for #AirbnbWhileBlack, I am getting 1 tweet/30 min.
I have tried by registering a new app on Tweeter website, but stills facing the same issue. I have searched on Google but didn’t figure any solution for the problem. Does anyone have any idea, how can I resolve this problem? Or any website through which I can collect data on #AirbnbWhileBlack.
LikeLike
Hi nayansinghal,
from what I can see, that hashtag has a very low frequency at the moment, i.e. only a few tweets per day, so what you’re observing is in fact correct. You can look into the Twitter Search API (also supported by tweepy) rather than the streaming used here, so you can go back to approximatively the last week (but some tweets/users might be missing from the results, as explained in the documentation):
https://dev.twitter.com/rest/public/search
http://docs.tweepy.org/en/v3.5.0/api.html#API.search
Cheers,
Marco
LikeLike
Thanks Marco for your response. I checked the Twitter search API and got only 70 tweets overall. I searched on Google and figured out that these APIs will provide you tweet only for past 7 days. Is there any other method through which I can get tweets before 1 week?
I know one of the solution is to extract tweet using web page scrapper but that doesn’t seem me a good idea.
LikeLike
in Line
with open(‘python.json’, ‘a’) as f:
it highlights with and says: “expected an indented block”
LikeLike
Hi,
you need to check the correct indentation of the code as shown in the article (sometimes with copy&paste the indentation gets lost)
Cheers,
Marco
LikeLike
The documentation says the return type of the search method is a list of Search Results, how can I extract the tweets of this?
LikeLike
I figured this out, but thanks!
LikeLike
Great :)
LikeLike
I’m interested in getting tweets of somebody’s followers, but I only get up to 20 followers, even though this person has more, why does this happen?
LikeLike
Hi Eduardo, if you check out the examples from my book (Chap02-03), there are some examples of how to use pagination with Tweepy, I think you can easily rearrange them for your needs… have a look at https://github.com/bonzanini/Book-SocialMediaMiningPython
Cheers
Marco
LikeLike
Hi, I have a doubt.. I installed tweepy 3.3.0, however i am not able to import that in my jupyter notebook and in anaconda spyder.. Could you please help..
LikeLike
Hey Sir,
I want to download one month old tweets regarding some keywords. How can I do that.
LikeLike
Hi, unfortunately the Twitter Search API only lets you go back in time to about two weeks, older tweets are not available.
Cheers,
Marco
LikeLike
i got Error 401??
LikeLike
Hi anjanna, error 401 happens because of unauthorised access: https://dev.twitter.com/overview/api/response-codes
You’ll need to check that your app is correctly registered and you have set your credentials in the script
Cheers,
Marco
LikeLike
I can filter out tweets with retweet_count less than X?
Thanks :)
LikeLike
Hi, I suggest you do some post-processing after the streaming is closed.
For example:
This will read your tweets.json file and create a tweets_filtered.json with only tweets that have been retweeted at least 5 times
Cheers,
Marco
LikeLike
You’re fast! thanks for your answer.
Ciao :)
LikeLike
Good afternoon, I’m a PhD student and I have a question. Would it be possible to get user data within a “Zone”? Of all the users that use twitter within that zone. Thank you very much in advance.
LikeLike
Hi Sameque, the streaming API allows you to define a specific location using geo coordinates as described here: https://dev.twitter.com/streaming/overview/request-parameters#locations
Cheers
Marco
LikeLike