NLP · Python

Stemming, Lemmatisation and POS-tagging with Python and NLTK

This article describes some pre-processing steps that are commonly used in Information Retrieval (IR), Natural Language Processing (NLP) and text analytics applications. In particular, the focus is on the comparison between stemming and lemmatisation, and the need for part-of-speech tagging in this context. The discussion shows some examples in NLTK, also as Gist on github.… Continue reading Stemming, Lemmatisation and POS-tagging with Python and NLTK

Python · Sentiment Analysis

Sentiment Analysis with Python and scikit-learn

Sentiment Analysis is a field of study which analyses people’s opinions towards entities like products, typically expressed in written forms like on-line reviews. In recent years, it’s been a hot topic in both academia and industry, also thanks to the massive popularity of social media which provide a constant source of textual data full of… Continue reading Sentiment Analysis with Python and scikit-learn

Python · Search

Searching PubMed with Python

PubMed is a search engine accessing millions of biomedical citations. Users can freely search for biomedical references. For some articles, the access to the full text paper is also open. This post describes how you can programmatically search the PubMed database with Python, in order to integrate searching or browsing capabilities into your Python application. There are two… Continue reading Searching PubMed with Python

Python

My Python Code is Slow? Tips for Profiling

tl;dr Before you can optimise your slow code, you need to identify the bottlenecks: proper profiling will give you the right insights. This article discusses some profiling tools for Python. Introduction Python is a high-level programming language with an emphasis on readability. Some of its peculiarities, like the dynamic typing, or the (in)famous GIL, might have some trade-offs in terms… Continue reading My Python Code is Slow? Tips for Profiling