How to Develop and Distribute Python Packages

This article contains some notes about the development of Python modules and packages, as well as brief overview on how to distribute a package in order to make it easy to install via pip.

Modules vs Packages in Python

Firstly, let’s start from the distinction between modules and packages, which is something sligthly different from language to language.

In Python, a simple source file containing the definitions of functions, classes and variables is a module. Once your application grows, you can organise your code into different files (modules) so that you can keep your sources tidy and clean, and you can re-use some of the functionalities in other applications.

On the other side, a package is a folder containing a __init__.py file, as well as other different Python source files. Typically a package contains several modules and sub-packages.

For example, you could have a foobar.py file where you declare a hello() function. You can re-use the function in different ways:

# import whole module and use its namespace
import foobar
foobar.hello()
# import specific function in local namespace
from foobar import hello
hello()
# import specific function in local namespace, create an alias
from foobar import hello as hi
hi()
# import all module declarations in local namespace
from foobar import *
hello()

The last option is usually considered sub-optimal, because you’re going to pollute the local namespace causing potential name conflicts. For example, assuming you imported some maths libraries and you’re using the log() function, is it coming from math.log() or numpy.log()? I usually aim for clarity when I choose which option is more suitable for a particular case.

Similarly, you can import a package, a particular definition, a sub-package, etc.

Notice: the import command will look for modules and packages in the working directory as well as folders declared in the Python path. You can find out where your libraries are stored by looking at:

import sys
print(sys.path)

The Python path can be extended with user-specific folders by overriding the $PYTHONPATH environment variable.

This means that if you want to make a particular module/package available to an application, it must either be in the working directory or in one of the folders dedicated to Python libraries. The latter option is usually achieved via the creation of an installation script.

Setup Tools and setup.py

As part of the Python Standard Library, the main component to develop installation scripts is distutils. However, to overcome its limitations, setuptools is now the recommended options.

By creating a setup.py script in the parent folder of your package, you can make it easy to install if you share it via Github or if you make it available for pip.

The basic structure of setup.py looks like:

from setuptools import setup

long_description = 'Looong description of your package, e.g. a README file'

setup(name='yourpackage', # name your package
      packages=['yourpackage'], # same name as above
      version='1.0.0', 
      description='Short description of your package',
      long_description=long_description,
      url='http://example.org/yourpackage',
      author='Your Name',
      author_email='your.name@example.org',
      license='MIT') # choose the appropriate license

The source code of the package should be put into a folder names with the package name itself, while the setup script should be in the parent directory together with the documentation. This is an example of source structure:

.
├── LICENSE
├── README.rst
├── setup.py
└── yourpackage
    ├── __init__.py
    ├── some_module.py
    ├── other_module.py
    └── sub_package
        ├── __init__.py
        └── more_modules.py

The LICENSE and README.rst files are documentation, the setup.py file is the installation script as above, while the whole source code of the package with its components is under the yourpackage folder.

You could install the package and make it available for any of your Python apps with:

python setup.py install

If you publish the above structure on a public repository, e.g. on Gibhub, anyone could easily install it with:

git clone https://www.github.com/yourname/yourpackage
cd yourpackage
python setup.py install

PyPI as Public Repo

PyPI, the Python Package Index, also known as the CheeseShop, is where developers can publish their Python packages to make them available for easy installation via pip.

Once your package is ready to be published, you’ll need to register your account on PyPI. You should also register your new package on PyPI: you can do so using the web form on the PyPI website.

Once your account is ready, create a file called .pypirc in your home folder:

$ cat ~/.pypirc 

[distutils]
index-servers=pypi

[pypi]
repository = https://pypi.python.org/pypi
username = your-username
password = your-password

Now you’re ready to push your package to the publish index:

python setup.py sdist upload

The sdist command will create the package to distribute, while the upload command will push it to the public repository using the information that you stored in ~/.pypirc.

At this point, you can install your brand new Python package on any machine by typing:

pip install yourpackage

Conclusion

Organising your code into modules and packages will help keeping your codebase clean. In particular, packing your code into meaningful packages will improve code re-use. There are only a few simply steps to follow in order to create a Python package that can be easily distributed, and if you decide to do so, the Python Package Index is the obvious choice.

Published by

Marco

Data Scientist

2 thoughts on “How to Develop and Distribute Python Packages”

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s