Big Data · Python · Spark

Getting Started with Apache Spark and Python 3

Apache Spark is a cluster computing framework, currently one of the most actively developed in the open-source Big Data arena. It aims at being a general engine for large-scale data processing, supporting a number of platforms for cluster management (e.g. YARN or Mesos as well as Spark native) and a variety of distributed storage systems… Continue reading Getting Started with Apache Spark and Python 3

Best Practices · Python

How to Develop and Distribute Python Packages

This article contains some notes about the development of Python modules and packages, as well as brief overview on how to distribute a package in order to make it easy to install via pip. Modules vs Packages in Python Firstly, let’s start from the distinction between modules and packages, which is something sligthly different from… Continue reading How to Develop and Distribute Python Packages

Data Visualisation · Javascript · Maps · Python

Mining Twitter Data with Python (and JS) – Part 7: Geolocation and Interactive Maps

Geolocation is the process of identifying the geographic location of an object such as a mobile phone or a computer. Twitter allows its users to provide their location when they publish a tweet, in the form of latitude and longitude coordinates. With this information, we are ready to create some nice visualisation for our data,… Continue reading Mining Twitter Data with Python (and JS) – Part 7: Geolocation and Interactive Maps

Functional Programming · Python

Functional Programming in Python

This is probably not the newest of the topics, but I haven’t had the chance to dig into it before, so here we go. Python supports multiple programming paradigms, but it’s not best known for its Functional Programming style. As its own creator has mentioned before, Python hasn’t been heavily influenced by other functional languages,… Continue reading Functional Programming in Python

Data Mining · NLP · Python · Sentiment Analysis

Mining Twitter Data with Python (Part 6 – Sentiment Analysis Basics)

Sentiment Analysis is one of the interesting applications of text analytics. Although the term is often associated with sentiment classification of documents, broadly speaking it refers to the use of text analytics approaches applied to the set of problems related to identifying and extracting subjective material in text sources. This article continues the series on… Continue reading Mining Twitter Data with Python (Part 6 – Sentiment Analysis Basics)

Graph Databases · Neo4j · Python

Getting started with Neo4j and Python

This article is a brief introduction to Neo4j, one of the most popular graph databases, and its integration with Python. Graph Databases Graph databases are a family of NoSQL databases, based on the concept of modelling your data as a graph, i.e. a collection of nodes (representing entities) and edges (representing relationships). The motivation behind… Continue reading Getting started with Neo4j and Python

Data Mining · Data Visualisation · NLP · Python

Mining Twitter Data with Python: Part 5 – Data Visualisation Basics

A picture is worth a thousand tweets: more often than not, designing a good visual representation of our data, can help us make sense of them and highlight interesting insights. After collecting and analysing Twitter data, the tutorial continues with some notions on data visualisation with Python. Tutorial Table of Contents: Part 1: Collecting data… Continue reading Mining Twitter Data with Python: Part 5 – Data Visualisation Basics