Mastering Social Media Mining with Python

Great news, my book on data mining for social media is finally out!

The title is Mastering Social Media Mining with Python. I’ve been working with Packt Publishing over the past few months, and in July the book has been finalised and released.

Links:

ebook and paperback on Packt Publishing (the publisher)
ebook and paperback on Amazon.com and Amazon UK
Companion code for the book on my GitHub
Author’s profile on goodreads (inc. book ratings/reviews)

As part of Packt’s Mastering series, the book assumes the readers already have some basic understanding of Python (e.g. for loops and classes), but more advanced concepts are discussed with examples. No particular experience with Social Media APIs and Data Mining is required. With 300+ pages, by the end of the book, the readers should be able to build their own data mining projects using data from social media and Python tools.

A bird’s eye view on the content:

Social Media, Social Data and Python
- Introduction on Social Media and Social Data: challenges and opportunities
- Introduction on Python tools for Data Science
- Overview on the use of public APIs to interact with social media platforms
#MiningTwitter: Hashtags, Topics and Time Series
- Interacting with the Twitter API in Python
- Twitter data: the anatomy of a tweet
- Entity analysis, text analysis, time series analysis on tweets
Users, Followers, and Communities on Twitter
- Analysing who follows whom
- Mining your followers
- Mining communities
- Visualising tweets on a map
Posts, Pages and User Interactions on Facebook
- Interacting the Facebook Graph API in Python
- Mining you posts
- Mining Facebook Pages
Topic analysis on Google Plus
- Interacting with the Google Plus API in Python
- Finding people and pages on G+
- Analysis of notes and activities on G+
Questions and Answers on Stack Exchange
- Interacting with the StackOverflow API in Python
- Text classification for question tags
Blogs, RSS, Wikipedia, and Natural Language Processing
- Blogs and web pages as social data Web scraping with Python
- Basics of text analytics on blog posts
- Information extraction from text
Mining All the Data!
- Interacting with many other APIs and types of objects
- Examples of interaction with YouTube, Yelp and GitHub
Linked Data and the Semantic Web
- The Web as Social Media
- Mining relations from DBpedia
- Mining geo coordinates

The detailed table of contents is shown on the Packt Pub’s page. Chapter 2 is also offered as free sample.

Please have a look at the companion code for the book on my GitHub, so you can have an idea of the applications discussed in the book.

Published by

Marco

Data Scientist View all posts by Marco

22 thoughts on “Mastering Social Media Mining with Python”

Pingback: Mining Twitter Data with Python (Part 1: Collecting data) – Marco Bonzanini
Alex Robertson says:

August 4, 2016 at 12:23 pm

I bought your book to help me learn about the Twitter API. The tip about the JSON Lines format (.jsonl) has already cleared up a lot of confusion I had regarding storing and retrieving tweets. Thanks a lot. By the way, the code I downloaded for twitter_map_example.py had zoom_start set to 17, which is too high to see both London and Paris in the same window. The book has this value at 5, which is much better.

LikeLiked by 1 person
1. Marco says:
  
  August 4, 2016 at 1:18 pm
  
  Hi Alex,
  that excessive zoom was probably me fooling around with the examples, but as you noticed the book has the correct/intended value. I’ve reverted back to the right value on the github repo, many thanks for reporting it!
  Cheers,
  Marco
  
  LikeLiked by 1 person
Pingback: Mastering Social Media Mining with Python « Social Platform
Sydney says:

September 8, 2016 at 7:45 pm

Hi Marco, Im running a small lab in Soweto teaching young to code for free can we use the code from github as class material without the actual book purchase due to no budget.
We love your work keep going.

LikeLiked by 1 person
1. Marco says:
  
  September 9, 2016 at 5:21 am
  
  Thanks for the nice words, the code is there to be used ;)
  Cheers,
  Marco
  
  LikeLiked by 1 person
2. Sydney says:
  
  September 9, 2016 at 9:01 am
  
  Thank you
  
  LikeLiked by 1 person
Pingback: PyCon UK 2016 write-up – Marco Bonzanini
Pingback: Mining Twitter Data with Python [Trump Years Ahead] « Another Word For It
bob dobbs says:

January 16, 2017 at 6:06 am

Hi Marco!

I’m working with python 3.5.0.
This snippet appears in chapter 1, section 3

>>> from nltk.tokenize import TweetTokenizer
>>> tokenizer = TwitterTokenizer()
>>> tweet = ‘@marcobonzanini: an example! :D http://example.com #NLP’
>>> print(tokenizer.tokenize(tweet))
# [‘@marcobonzanini’, ‘:’, ‘an’, ‘example’, ‘!’, ‘:D’, ‘http://example.com’, ‘#NLP’]

the second line:

>>> tokenizer = TwitterTokenizer()

returns this:

Traceback (most recent call last):
File “”, line 1, in
NameError: name ‘TwitterTokenizer’ is not defined

I’m guessing that TwitterTokenizer is no longer provided by nltk?

Where could I go to for support on this issue?

LikeLiked by 1 person
1. Marco says:
  
  January 16, 2017 at 6:28 am
  Hi Bob, this will go in the “errata” section
  The correct line should be:
```
tokenizer = TweetTokenizer()
```
  (i.e. “Tweet” instead of “Twitter”, like the class just imported in the first line).
  Thanks for reporting this.
  Cheers,
  Marco
  
  LikeLiked by 1 person
Lorne Hanson says:

March 13, 2017 at 5:26 pm

Marco. Thanks for publishing this book. I do have one problem though. It is not your book. But, Twitter seems to be rejecting authentication calls to OAuth from Tweepy. I have quadruple checked my code and my access keys. But, I still get the 214 error “Bad Authentiication Data”. I also tried using Twython and got the same result. It seems to be an ongoing problem with Twitter based on Stackoverflow comments. Do you know anything about this? Thanks!

LikeLike
Marco says:

March 15, 2017 at 8:32 am

Hi Lorne
error 215 Bad Authentication Data happens when you don’t authenticate or when you send empty authentication, so assuming you’re following all the right steps to set up the app I’m not sure where to look. I’ve tested again the code as it is from the book with old and new apps (e.g. re-using existing access keys and getting new ones) and everything is smooth.
Marco

LikeLike
1. Lorne Hanson says:
  
  March 15, 2017 at 3:53 pm
  
  Thank you, Marco. I will take another look at my code. I copied it straight from the book. But, I could have made an error that I didn’t catch the first five times I looked. I am sorry to bother you with this.
  
  Warmest regards,
  
  Lorne
  
  On Wed, Mar 15, 2017 at 4:32 AM, Marco Bonzanini wrote:
  
  > Marco commented: “Hi Lorne error 215 Bad Authentication Data happens when > you don’t authenticate or when you send empty authentication, so assuming > you’re following all the right steps to set up the app I’m not sure where > to look. I’ve tested again the code as it is from ” >
  
  LikeLike
Pingback: PyCon Italy 2017 write-up – Marco Bonzanini
Srinidhi skanda V says:

April 19, 2017 at 6:45 am

Sir, excellent book my humble suggestion is in future please consider writing a book about text analysis. I am doing masters in computational engineering my project involves collecting data for sentiment analysis. Since I am from India lot text (comments or tweets) that I collected from Twitter and Facebook are in transliterated or code mix form. This is really challenging because there is no tool or library that addresses the morphological complexity of the code- mix text. I need a suggestion how to deal with such complex nature. How can I classify sentiment for this kind of text?
I would happy to share with you some complexity involved in my project in personal with you in the form of email and my research interest.

Thank you.

LikeLike
Milad says:

April 1, 2018 at 10:03 pm

Dear Marco thank you for the inspiring book. I am having a hard time with Twitter error response: status code = 400. I think I have looked everywhere online and offline to solve the problem without any result. I really appreciate if you can help me.

LikeLike
hkabla says:

June 25, 2018 at 8:12 am

Dear Marco, I too wanted to thank you for this excellent book. Two remarks from my side:
– in the Chap04 facebook_get_page_posts.py example, you should replace = graph.get_connections(‘PacktPub’,
by
graph.get_connections(args.page,
in order to collect posts from the right page (and not PacktPub…)
– do you plan to develop a LinkedIn section in the future? This is really missing.

LikeLike
1. Marco says:
  
  June 25, 2018 at 5:00 pm
  
  Thanks for the kind words. To answer your suggestions:
  – the problem with args.page has been documented and fixed in the github repo a while ago (thanks to a previous suggestion from a translator); I use the github repo to keep track of the corrections if anything pops up (thanks for the suggestion anyway!)
  – the problem with unicode is related to using Python 2. My suggestion is to upgrade to Python 3 as soon as possible because Python 2 is getting close to its end-of-life date; the code from the book is written in Python 3, and doesn’t support Python 2 explicitly although many snippets can work without too much trouble. Amongst other reasons for upgrading to Python 3, unicode is much less painful in Py3 than it is in Py2. If you’re stuck with Python 2, your suggestion does indeed help, but my recommendation is to upgrade asap :)
  – I’m not planning any new edition at the moment. I have to say that originally I was planning to include LinkedIn when I was thinking about the book outline a few years ago, but then their API became very limited for public use, i.e. you have to pay to get access to the most interesting analytics features, and the public access is not very interesting, so I preferred to focus on other tools that people can openly use without paying a fee.
  
  Thanks again for the nice words
  
  Cheers
  Marco
  
  LikeLike
hkabla says:

June 25, 2018 at 8:16 am

Also, if you plan to display messages with unicode character (for french speakers, for exaple), you should recommend to add from __future__ import unicode_literals at the beginning of facebook_top_posts.py

LikeLike
Raul Sanchez says:

December 27, 2018 at 4:53 pm

Dear Marco, I have a quick question: in your book you have the example of streaming with the following command
python twitter_streaming.py \#RWC2015 \#RWCFinal rugby
I would like to know how to save the file, where is it going. because if i write that sentence in my computer, it keeps thinking and does not produce any outcome. actually I need to stop it with ctrl+C.

thank you very much

LikeLike
1. Marco says:
  
  January 3, 2019 at 6:39 pm
  
  Hi Raul, the output of twitter_streaming.py is stored in a file called stream_[your query].jsonl where [your query] is the set of keywords or hashtags that you track, in this case “stream__RWC2015__RWCFinal_Rugby.jsonl” (notice the # symbol and others are converted to underscores). The file will be in the same folder as the twitter_streaming.py script, and it will be created only when you receive the first tweet.
  If you’re tracking #RWC2015 today, probably you won’t see any new tweet coming through (that’s why no file was being saved), so I’d suggest you test the script with other keywords that are not related to past events like the 2015 Rugby World Cup
  
  Best regards
  Marco
  
  LikeLike