API · Big Data · Books · Data Mining · NLP · Python · Text Analytics · Text Mining

Mastering Social Media Mining with Python


Great news, my book on data mining for social media is finally out!

The title is Mastering Social Media Mining with Python. I’ve been working with Packt Publishing over the past few months, and in July the book has been finalised and released.


As part of Packt’s Mastering series, the book assumes the readers already have some basic understanding of Python (e.g. for loops and classes), but more advanced concepts are discussed with examples. No particular experience with Social Media APIs and Data Mining is required. With 300+ pages, by the end of the book, the readers should be able to build their own data mining projects using data from social media and Python tools.

A bird’s eye view on the content:

  1. Social Media, Social Data and Python
    • Introduction on Social Media and Social Data: challenges and opportunities
    • Introduction on Python tools for Data Science
    • Overview on the use of public APIs to interact with social media platforms
  2. #MiningTwitter: Hashtags, Topics and Time Series
    • Interacting with the Twitter API in Python
    • Twitter data: the anatomy of a tweet
    • Entity analysis, text analysis, time series analysis on tweets
  3. Users, Followers, and Communities on Twitter
    • Analysing who follows whom
    • Mining your followers
    • Mining communities
    • Visualising tweets on a map
  4. Posts, Pages and User Interactions on Facebook
    • Interacting the Facebook Graph API in Python
    • Mining you posts
    • Mining Facebook Pages
  5. Topic analysis on Google Plus
    • Interacting with the Google Plus API in Python
    • Finding people and pages on G+
    • Analysis of notes and activities on G+
  6. Questions and Answers on Stack Exchange
    • Interacting with the StackOverflow API in Python
    • Text classification for question tags
  7. Blogs, RSS, Wikipedia, and Natural Language Processing
    • Blogs and web pages as social data Web scraping with Python
    • Basics of text analytics on blog posts
    • Information extraction from text
  8. Mining All the Data!
    • Interacting with many other APIs and types of objects
    • Examples of interaction with YouTube, Yelp and GitHub
  9. Linked Data and the Semantic Web
    • The Web as Social Media
    • Mining relations from DBpedia
    • Mining geo coordinates

The detailed table of contents is shown on the Packt Pub’s page. Chapter 2 is also offered as free sample.

Please have a look at the companion code for the book on my GitHub, so you can have an idea of the applications discussed in the book.

11 thoughts on “Mastering Social Media Mining with Python

  1. I bought your book to help me learn about the Twitter API. The tip about the JSON Lines format (.jsonl) has already cleared up a lot of confusion I had regarding storing and retrieving tweets. Thanks a lot. By the way, the code I downloaded for twitter_map_example.py had zoom_start set to 17, which is too high to see both London and Paris in the same window. The book has this value at 5, which is much better.

    Liked by 1 person

    1. Hi Alex,
      that excessive zoom was probably me fooling around with the examples, but as you noticed the book has the correct/intended value. I’ve reverted back to the right value on the github repo, many thanks for reporting it!

      Liked by 1 person

  2. Hi Marco, Im running a small lab in Soweto teaching young to code for free can we use the code from github as class material without the actual book purchase due to no budget.
    We love your work keep going.

    Liked by 1 person

  3. Hi Marco!

    I’m working with python 3.5.0.
    This snippet appears in chapter 1, section 3

    >>> from nltk.tokenize import TweetTokenizer
    >>> tokenizer = TwitterTokenizer()
    >>> tweet = ‘@marcobonzanini: an example! :D http://example.com #NLP’
    >>> print(tokenizer.tokenize(tweet))
    # [‘@marcobonzanini’, ‘:’, ‘an’, ‘example’, ‘!’, ‘:D’, ‘http://example.com’, ‘#NLP’]

    the second line:

    >>> tokenizer = TwitterTokenizer()

    returns this:

    Traceback (most recent call last):
    File “”, line 1, in
    NameError: name ‘TwitterTokenizer’ is not defined

    I’m guessing that TwitterTokenizer is no longer provided by nltk?

    Where could I go to for support on this issue?

    Liked by 1 person

    1. Hi Bob, this will go in the “errata” section
      The correct line should be:

      tokenizer = TweetTokenizer()

      (i.e. “Tweet” instead of “Twitter”, like the class just imported in the first line).
      Thanks for reporting this.

      Liked by 1 person

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s