Big Data · Python · Spark

Getting Started with Apache Spark and Python 3

Apache Spark is a cluster computing framework, currently one of the most actively developed in the open-source Big Data arena. It aims at being a general engine for large-scale data processing, supporting a number of platforms for cluster management (e.g. YARN or Mesos as well as Spark native) and a variety of distributed storage systems… Continue reading Getting Started with Apache Spark and Python 3

Best Practices · Python

How to Develop and Distribute Python Packages

This article contains some notes about the development of Python modules and packages, as well as brief overview on how to distribute a package in order to make it easy to install via pip. Modules vs Packages in Python Firstly, let’s start from the distinction between modules and packages, which is something sligthly different from… Continue reading How to Develop and Distribute Python Packages