Big Data · Data Mining · Engineering · Python

Building Data Pipelines with Python and Luigi

As a data scientist, the emphasis of the day-to-day job is often more on the R&D side rather than engineering. In the process of going from prototypes to production though, some of the early quick-and-dirty decisions turn out to be sub-optimal and require a decent amount of effort to be re-engineered. This usually slows down… Continue reading Building Data Pipelines with Python and Luigi

API · Python · Text Analytics

Easy Text Analytics with the Dandelion API and Python

In the past few weeks, I’ve been playing around with some third-party Web APIs for Text Analytics, mainly for some side projects. This article is a short write-up of my experience with the Dandelion API. Notice: I’m not affiliated with dandelion.eu and I’m not a paying customer, I’m simply using their basic (i.e. free) plan… Continue reading Easy Text Analytics with the Dandelion API and Python

MongoDB · NoSQL · Python

Getting Started with MongoDB and Python

MongoDB is one of the popular NoSQL databases. It uses a document-oriented, JSON-like approach to represent data, making the integration of semi-structured data fairly easy. This article is an introduction on how to use PyMongo, the package to interact with MongoDB in Python, for basic interactions with the database. MongoDB Basics As mentioned in the… Continue reading Getting Started with MongoDB and Python

NLP · Text Analytics · Text Mining · Text Summarisation

A Brief Introduction to Text Summarisation

In this article, I’ll discuss some aspects of text summarisation, the process of analysing a text document, or a set of documents, in order to produce a summary of its content. The overall purpose is to reduce the amount of information that a user has to digest in order to understand whether reading the whole… Continue reading A Brief Introduction to Text Summarisation

Elasticsearch · Javascript · Python · Search

Building a search-as-you-type feature with Elasticsearch, AngularJS and Flask (Part 2: front-end)

This article is the second part of a tutorial which describes how to build a search-as-you-type feature based on Elasticsearch, Python/Flask and AngularJS. The first part has discussed how to set-up Elasticsearch and a microservice in Python/Flask, i.e. the back-end part of the system. It also provided an overall view on the architecture. In this… Continue reading Building a search-as-you-type feature with Elasticsearch, AngularJS and Flask (Part 2: front-end)

Elasticsearch · Javascript · Python · Search

Building a Search-As-You-Type Feature with Elasticsearch, AngularJS and Flask

Search-as-you-type is an interesting feature of modern search engines, that allows users to have an instant feedback related to their search, while they are still typing a query. In this tutorial, we discuss how to implement this feature in a custom search engine built with Elasticsearch and Python/Flask on the backend side, and AngularJS for… Continue reading Building a Search-As-You-Type Feature with Elasticsearch, AngularJS and Flask

Big Data · Python · Spark

Getting Started with Apache Spark and Python 3

Apache Spark is a cluster computing framework, currently one of the most actively developed in the open-source Big Data arena. It aims at being a general engine for large-scale data processing, supporting a number of platforms for cluster management (e.g. YARN or Mesos as well as Spark native) and a variety of distributed storage systems… Continue reading Getting Started with Apache Spark and Python 3