Data Mining · NLP · Python

Mining Twitter Data with Python (Part 2: Text Pre-processing)

This is the second part of a series of articles about data mining on Twitter. In the previous episode, we have seen how to collect data from Twitter. In this post, we’ll discuss the structure of a tweet and we’ll start digging into the processing steps we need for some text analysis. Table of Contents… Continue reading Mining Twitter Data with Python (Part 2: Text Pre-processing)

NLP · Python

Stemming, Lemmatisation and POS-tagging with Python and NLTK

This article describes some pre-processing steps that are commonly used in Information Retrieval (IR), Natural Language Processing (NLP) and text analytics applications. In particular, the focus is on the comparison between stemming and lemmatisation, and the need for part-of-speech tagging in this context. The discussion shows some examples in NLTK, also as Gist on github.… Continue reading Stemming, Lemmatisation and POS-tagging with Python and NLTK