Intervista Pythonista: Podcast Interview for the Italian Python Community

A few weeks ago I had the pleasure of chatting with Marco Santoni and Cesare Placanica from the Python Milano user group, as the first guest of their new Intervista Pythonista, a podcast about the Italian Python community. You can listen to the episode on Anchor/Spotify in Italian.

This blog post offers a very belated summary of the podcast episode for English readers. It’s not a full transcript but it sums up our chat.

Some of the points we discussed (questions from the hosts in bold):

  • How did you get into Python? And how do you use it nowadays?
    My first encounter with Python was random. At that time I was working as software engineer, mostly on web applications (PHP and JavaScript, long before Angular/React and friends). Someone, probably from a local Linux User Group, mentioned Python and I had a first look. When I got into Natural Language Processing, it felt like a good excuse to study this new (for me) language a bit more and to look into the ecosystem — only NLTK at the time, as far as I remember. Later I used it for my MSc dissertation and throughout my PhD work. In the meanwhile the Python ecosystem for Data Science became more and more robust, and it became my first weapon of choice. Nowadays I use it as my main tool for all my Data Science work, and I teach it at my corporate training courses.
  • How do you keep your knowledge up to date [with the latest Python developments]?
    Through a variety of channels. Conferences and meetups are a great way to see what people are working on, and to be exposed to fresh ideas. These days in-person gatherings are on hold so that part is mostly missing, but different user groups are still producing a lot of good content published on YouTube. Twitter or other social media channels can also be useful to catch up with some Python news, bump into a new library or new articles. Of course it’s difficult to keep up with everything so one has to be a bit selective when it comes to spend time digging into the details. When I’m really interested in a particular topic, after a first look at blogs and tutorials, I’d probably seek something more structured like a book or a video course.
  • How (and why) did you start a career as solo consultant? Bonus: how do you find your clients?
    I started my Data Science consultancy firm in 2015, mainly looking for independence. Taking the first step was simple (register your company online at Companies House and you’re good to go). Over time I’ve learned, and I’m learning, all the other facets of how to run a business, but taking that first step was the crucial moment. On the topic of finding clients, most of my work comes through my network, e.g. via word-of-mouth from people I’ve worked with in the past, or through a recommendation from a person I’ve helped somehow in the past. With this in mind, having a presence at conferences/events and curating your personal network is essential.
  • How did you get into NLP?
    I was working on a search application so I had to learn more. I started learning about Information Retrieval methods like TF-IDF, using off-the-shelf tools. From there, things took off: I later studied Information Retrieval for my MSc and PhD, I developed an interest for Natural Language Processing at large and I’ve been involved in many NLP projects. These days, I work on a broad variety of Data Science projects but NLP remains my favourite topic.
  • Can you tell us about your experience in writing books?
    I wrote one book on Data Mining for Social Media, and developed two video courses on data science so far. It’s a lot of work! The reward is usually not on the financial side, unless you write the next Harry Potter series. The process gave me a lot of insights on the publishing industry as a whole and put me in touch with a variety of professionals in that industry, which was great. I learned bits and pieces related to editing, marketing and all the other steps that one doesn’t think about when starting a book. More importantly, it gave me the opportunity to polish my writing skills, which I think are essential in our profession, even more so for a consultant. I think for every professional in our field, improving your writing is a good investment, that will always pay dividends in the long run.
  • Tell us about your experience as Python trainer
    I’ve been teaching and training in tech, in different capacities, for more than twenty years now. In the early days of my career, it was a second job through a local non-profit organisation. Later it became more and more central and after a short stint in academia, a few years ago I started offering corporate training courses as part of my consulting services. The demand for Python and Data Science is high, with many companies looking into the PyData stack for their data analytics and business intelligence needs. Most of my trainings these days are 2 or 3-day sessions with small groups of circa 10-12 people. Python is easy to pick up so during those few days of training folks can learn a lot and feel productive, even people who are new to programming.
  • How has your training experience changed with remote work now being more prominent?
    Training remotely is something I’ve been doing for a few years, so when Covid restrictions forced companies to work from home, I was ready. Reading the room in a Zoom call is obviously more difficult compared to in-person sessions, but there are tricks one can implement to keep the engagement high, improving the overall enjoyment for the delegates. For example, short demos with frequent “try it yourself” moments bring up questions that help me perceive the students’ understanding of the subject — Jupyter notebooks are great for this because of their interactive nature. Splitting the class in groups of 2-3 people using breakout rooms is also very useful for interactive exercises that people can solve in a “pair programming” fashion.
  • Tell us about your community engagement (meet-ups, conferences, etc)
    Since 2014 I started attending local meet-ups, and in particular I was regularly at PyData London, a new (at the time) meet-up around Python and Data Science. I was enjoying the atmosphere and the quality of the presentations, and I found myself coming back every month. Like many Python events, PyData London is community-driven: everything is run by volunteers. Shortly after the first few events, I got closer with the regulars and the organisers and started helping out with the monthly meet-ups and even with the annual conference, reviewing proposals, chairing sessions, etc. Since 2018 I’ve been the chair of our annual conference, and I’ve helped growing the conference to over 700 attendees. Meanwhile, I’ve also attended many other Python conferences giving talks and tutorials, in particular PyCon UK and PyCon Italy, but also EuroPython, various local PyData chapters in the UK, PyMunich and PyParis. The common line is always the great community: you get the opportunity to meet interesting people in your field, talking shop while enjoying a relaxing atmosphere. I certainly recommend everybody to look for local events to connect to your peers, and if possible to present your work to boost your CV!

Many thanks to Marco and Cesare for having me as first guest of their podcast, which now has already a few episodes under its belt, do check them out here!

PyData London 2018

Last weekend (April 27-29) we run PyData London 2018, the fifth edition of our annual conference (we also have a monthly meet-up, with currently 7,200+ members).

The event is entirely run by volunteers, with the purpose of bringing the community together and raising money for NumFOCUS, the charity that provides financial support to open-source scientific computing projects.

This year I had the pleasure of chairing the conference together with Cecilia and Florian. The organisation started in September last year when the chairing committee was formed.

These are some of the highlights of the weekend:

  • A new and bigger venue, the Tower Hotel in front of the iconic Tower Bridge, we had about 330 delegates for the tutorials on Friday and 550 for the talks on Saturday and Sunday
  • A great programme with 4 keynotes, 12 tutorials, 36 talks and two session of lightning talks. With more than 200 proposals, the review committee did an amazing job (thanks to Linda for leading the effort)
  • A Beginners Bootcamp run the day before the conference by Conrad of PythonAnywhere
  • Community-driven hackathons: an Algorithmic Art Hackathon (led by Tariq and our friends at the Algorithmic Art Meet-up), a pandas sprint (led by Marc and the Python Sprints Meet-up), and a Politics-themed hackathon (led by John and Frank of PyData Bristol)
  • An Algorithmic Art Expo: our friends from the Algorithmic Art Meet-up brought in some cool toys showcasing their work
  • Diversity Round Table, organised by Gina Helfrich
  • Childcare: for the first time we’ve been able to offer an on-site creche, supporting parents who otherwise wouldn’t be able to enjoy the conference
  • Book signing with Steve Holden (Python in a Nutshell), Ian Ozsvald (High Performance Python) and Holden Karau (High Performance Spark); thanks to O’Reilly we had 60 paperback books as gifts to our attendees
  • Our Social Event with the now classic Pub Quiz organised by James Powell

For a flavour of what the event was like, you can check out the buzz on Twitter and our shared photo album.

Thanks to all the people who contributed to yet another great PyData event!

@MarcoBonzanini

Video Course: Practical Python Data Science Techniques

I’m happy to announce the recent release of my second video course,
Practical Python Data Science Techniques published with Packt Publishing.

VideoCourse-Cover

Links:

This video course follows my first introductory course (Data Analysis with Python) and provides the audience with recipe-like solutions to common Data Science problems.

In particular, with about 2.5 hours of material, the video course covers the following topics:

  1. Exploring Your Data
    This section covers some of the most common techniques related to loading data, performing exploratory analysis and cleaning your data to get them in the right shape.
  2. Dealing with Text
    describes the common pre-processing techniques that you need to deal with text, from tokenisation to normalisation, to calculating word frequencies.
  3. Machine Learning Problems
    describes the most common Machine Learning problems and how to tackle them using scikit-learn.
  4. Time Series and Recommender Systems
    The last section groups some miscellanous topics, in particulr Time Series Analysis and the basics to implement a recommender system.

More details about the content of the course are available on the PacktPub’s page, and of course you can check out the code examples on my GitHub (links on top of this page).

If you are a beginner you may also be interested in my other video course, Data Analysis with Python (see video course on PacktPub.com, course material on GitHub and course overview on this blog).

@MarcoBonzanini

PyCon Italy 2017 write-up

Last week I’ve travelled to Florence where I attended PyCon Otto, the 8th edition of the Italian Python Conference. As expected, it’s been yet another great experience with the Italian Python community and many international guests.

This year the very first day, Thursday, was beginners’ day, with introductory workshops run by volunteer mentors. Thanks to a cancelled flight, I’ve missed out on this opportunity so I joined the party only for the main event.

On Friday, I’ve run another version of my tutorial on Natural Language Processing for beginners. The tutorial was oversubscribed and the organisers really made an effort to accommodate as many people as possible in the small training room, so at the end, I had ~35 attendees. After the workshop, I had a lot of interesting conversations and some ideas on how to improve the material with additional exercises. Some credits for this are due to my friend Miguel Martinez who contributed with the text classification material for the first edition of the workshop.

As per tradition, at the end of the workshop I’ve also run a raffle to give away a free copy of my book on Mastering Social Media Mining with Python.

On Saturday, I gave a talk titled Word Embeddings for Natural Language Processing with Python (link to slides), somehow a natural follow-up of the tutorial with slightly more advanced concepts, but still tailored for beginners. The talk was really well received, and a lot of interesting questions and conversations came up.

Following the traditional social event on Saturday night (a huge fiorentina), Sunday was pretty much a mellow day, with the last few excellent talks, a light lunch and my journey back.

It was great to meet so many new and old friends! The quality of this community event was stellar, and this was possible thanks to the contributions of organisers, volunteers, mentors, speakers and all the attendees.

See you for PyCon Italy 2018!

PyCon UK 2016 write-up

Last week I had a long weekend at PyCon UK 2016 in Cardiff, and it’s been a fantastic experience! Great talks, great friends/colleagues and lots of ideas.

On Monday 19th, on the last day of the conference, my friend Miguel and I have run a tutorial/workshop on Natural Language Processing in Python (the GitHub repo contains the Jupyter notebooks we used as well as some slides for an introduction).

Our NLP tutorial

Since I’ve already mentioned it, I’ll start from the end :)

The tutorial was tailored for NLP beginners and, as I mentioned explicitly at the very beginning, I wasn’t there to impress the experts. Rather, the whole point was to get the attendees a bit curious about Natural Language Processing, and to show them what you can do with a few lines of Python.

Overall, I think we’ve been quite lucky as we had the perfect audience: the right number of people (around 20+) with a bit of Python knowledge but not much NLP knowledge.

We only had some minor hiccups with the installation process, which is something we’re going to work on to make it smoother and more beginner-friendly. In particular the things I’d like to improve are:

  • add some testing / pre-flight checks, e.g. “how do I know that the environment is set up correctly?” (Miguel has already added this)
  • support for Windows: I’m quite useless with trouble-shooting Windows issues, but a couple of attendees had some troubles with the installation process not going too smoothly; maybe some virtual machine setup will be helpful

I also think having the material available in advance, so the attendees can start setting up the environment is very helpful. Most of them were quite engaged and I received a couple of “bug reports” on-the-fly, even a pull request that improved the installation process (thanks!)

Last but not least, I was also happy to give out a copy of my book (Mastering Social Media Mining with Python) that I had with me (the raffle was implemented on the spot through random.choice(), and the book went to Paivi from Django Girls).

I’ll give a shorter version of this tutorial at PyCon Ireland later this year, so in case you’ll be around, I’ll see you there :)

Unfortunately, the tutorials were not recorded so there is no video on-line, but the slides are in the GitHub repo so please dig in and send feedback if you have any.

The Open Day

Thursday 15th was “day zero” of the conference, hosted at Cardiff University. The ticket was free, although there was limited capacity. The day was aimed at introducing the new audience to Python and PyCon. We haven’t seen much Python code on that day, as the talks were mainly for newcomers, yet we had a lot of food for thoughs. This is a great way to introduce more people to Python and to show them how the community is friendly and happy to get more beginners on board.

Teachers, Kids and Education

One of the main themes of the conference was Education. Friday 16th, the first day of the main event, was labelled “Teachers Day”, while Saturday 17th was “Kids Day”. The effort to make CS education more accessible for kids was very clear, and some of the initiatives were really spot-on. In particular, some of the kids have been able to hack some small project together in a very short time, and they delivered a “show and tell” session at the end of the second day. I think their creativity and the fact that they were standing in front of a crowd of 500+ developers to show what they have been working on during their day have been very impressive.

Community in the Broader Sense

Another aspect that became quite clear is the strength of the Python Community. Some representatives of PyCon Poland, PyCon Switzerland and Django Europe were introducing their upcoming events. Some attendees with less economic capabilities were given the opportunity to attend, through some form of financial support (including e.g. students from India).

Representatives from PyCon Namibia and PyCon Zimbabwe were also attending and they discussed some of the challenges they are facing while building a local community in their countries.

In particular, the work Jessica from PyNAM is carrying out with young learners is extremely inspiring and deserves more visibility (link to the video of her talk).

Accessibility for Everybody

One of the features that I’ve never experienced in a conference so far was the speech-to-text transcription. During the talks, the speech-to-text team have been very busy writing down what the speakers were saying in real-time. While this is sometimes considered an accessibility feature which might benefit only deaf users, it turned out live captions are extremely beneficial for everybody. Firstly, not all the non-deaf attendees have perfect hearing. Secondly, not everybody is an English native speaker (both speakers and audience), so a word might be missed, or an accent might cause some confusion. Lastly, not every attendee is paying full attention to every talk for the whole talk: sometimes towards the end of the day, you just switch off for a moment and the live captions allow you to catch up.

Providing some accessibility feature turned out to be beneficial for everybody.

Shout out to the Organisers

Organising such a big event (500+ attendees) is not an easy task, so all the people who have worked hard to make this conference happen deserve a big round of applause. Not naming names here, but if you’ve been involved, thanks!

Being Interviewed about NLP

This was a bit random, in a very pleasant way. On Saturday, Miguel, Lev from RaRe Technologies and I spent some time with Kate Jarmul, who by the way just introduced her book on data wrangling, and also delivered a tutorial on the topic. The topic of the conversation was on our views, in the broader sense, about NLP / Text Analytics, how we got into this field, how we see this field evolving and so on. Apparently, this was an interview with some experts of the field, for a piece she’s writing for the O’Reilly blog (I should put an amazed emoticon here).

Using Python for …

The breadth of the topics discussed during the conference was really amazing. I think this kind of events are a great way to see what people are working on and how the tools we use every day are used by other people.

I’m not going to name any talk in particular, because there are too many good talks that deserve to be mentioned.

In terms of topics, some fields that are well covered by Python are:

  • Data Science (and related topics like data cleaning, NLP and machine learning)
  • Web development (with Django and so many interesting libraries)
  • electronics and robotics (with Raspberry Pi, micro:bit, MicroPython etc)
  • you name it :)

I’m probably not saying anything new here, but it was nice to see it in first person and step outside my data-sciency comfort zone.

Summary

Thanks to everybody who contributed to this event, and see you in Cardiff for PyCon UK 2017!

Mastering Social Media Mining with Python

book-cover

Great news, my book on data mining for social media is finally out!

The title is Mastering Social Media Mining with Python. I’ve been working with Packt Publishing over the past few months, and in July the book has been finalised and released.

Links:

As part of Packt’s Mastering series, the book assumes the readers already have some basic understanding of Python (e.g. for loops and classes), but more advanced concepts are discussed with examples. No particular experience with Social Media APIs and Data Mining is required. With 300+ pages, by the end of the book, the readers should be able to build their own data mining projects using data from social media and Python tools.

A bird’s eye view on the content:

  1. Social Media, Social Data and Python
    • Introduction on Social Media and Social Data: challenges and opportunities
    • Introduction on Python tools for Data Science
    • Overview on the use of public APIs to interact with social media platforms
  2. #MiningTwitter: Hashtags, Topics and Time Series
    • Interacting with the Twitter API in Python
    • Twitter data: the anatomy of a tweet
    • Entity analysis, text analysis, time series analysis on tweets
  3. Users, Followers, and Communities on Twitter
    • Analysing who follows whom
    • Mining your followers
    • Mining communities
    • Visualising tweets on a map
  4. Posts, Pages and User Interactions on Facebook
    • Interacting the Facebook Graph API in Python
    • Mining you posts
    • Mining Facebook Pages
  5. Topic analysis on Google Plus
    • Interacting with the Google Plus API in Python
    • Finding people and pages on G+
    • Analysis of notes and activities on G+
  6. Questions and Answers on Stack Exchange
    • Interacting with the StackOverflow API in Python
    • Text classification for question tags
  7. Blogs, RSS, Wikipedia, and Natural Language Processing
    • Blogs and web pages as social data Web scraping with Python
    • Basics of text analytics on blog posts
    • Information extraction from text
  8. Mining All the Data!
    • Interacting with many other APIs and types of objects
    • Examples of interaction with YouTube, Yelp and GitHub
  9. Linked Data and the Semantic Web
    • The Web as Social Media
    • Mining relations from DBpedia
    • Mining geo coordinates

The detailed table of contents is shown on the Packt Pub’s page. Chapter 2 is also offered as free sample.

Please have a look at the companion code for the book on my GitHub, so you can have an idea of the applications discussed in the book.

PyData London 2016 write-up

Last weekend I was at the PyData London conference for three Pythonic days. Firstly, thanks to the organiser, volunteers, speakers, sponsors and everyone who has contributed in a way or another to make the event a great success.

This year I had the opportunity to contribute as member of the review committee, which means I had a glimpse at the behind-the-scenes and I know how many great proposals we had. With three days and three to four tracks running in parallel, there is room for a lot of Pythonic parley, yet unfortunately many good proposals had to be turned down due to time/space constraints. The programme turned out to be great nevertheless.

The three days were really intense so there is just too much to say, but I’ll try to summarise some of the take-home messages.

Tutorials: delivering a tutorial is difficult. Everything that could go wrong, will go wrong (big screen that goes bananas for 10 minutes, flaky Internet connection so a conda install takes ages, you mention it). Jupyter notebook makes life better, but I strongly feel for the speakers, so a big thank you for taking the time to prepare some quality material.

Topics of interest: some topics seem to capture most of the attention this year, in particular there was a lot of interest around data pipelines, deep learning and Bayesian stats. Unsurprising?

Keynotes: following the recent news on the LIGO project, Prof. Andreas Freise gave an introduction to gravitational waves, lasers, the latest achievements in physics and other cool things far beyond my understanding. Something I could understand and relate to is his way to describe how he needs to write code to carry on his job, but writing code is not his main job. This is true for many academics and researchers without a software engineering background, who were also the main audience of my talk on building data pipelines (luckily enough, scheduled right after the keynote in the same room).

The second keynote, given by Tetiana Ivanova, was about the beginning of her journey in Data Science without formal education. Some of the suggestions were sensible, in fact I recently shared some of the same ideas in a short talk to UCL students and post-docs who want to move to industry.

The third and last keynote was given by Travis Oliphant: CEO of Continuum Analytics, author of NumPy, creator of SciPy, Pythonista since the late 1990’s. His talk was about scaling up and scaling out the PyData stack. Things to watch out for: Numba and Dask. Really exciting stuff going on!

My talk: I presented “Building Data Pipelines in Python”, with a focus on the need to bring R&D and Engineering together, and how basic engineering principles can be beneficial even if your job is not all about writing code. After presenting a very similar talk at PyCon Italy, I found the audience in London to be a bit more on the academic side than I initially thought, which was perfect for my engineering rants. After the usual first few minutes of feeling awkward when speaking publicly, I started my discussion on unit testing and asked how many in the audience write unit tests regularly. Random guy from the audience: “What’s a unit test?”. Thank you kind stranger, you lifted my spirit and the rest of the talk was a breeze.

The slides of my talk are on my speakerdeck.

Last year it took several months to get the videos out, this year only one day! So this is the video of my talk: https://www.youtube.com/watch?v=7NzH1Gx8-4E

I had some interesting questions after the talk and I also had some nice conversations the day after. Apparently, I raised some interest on Luigi, in fact a few people told me how they really had to attend the other talk about using Luigi in production, deliverd by Pete Owlett from Deliveroo, after listening to mine (the room was overflowing so I couldn’t even get close!). There was also some genuine interest on unit testing, and a very interesting question was how to apply it when working with Jupyter notebooks.

Lighting talks: apparently, saving your Jupyter notebooks on git is an issue that is taken very seriously by the community. In fact, three speakers came up with different solutions for the same problem.

Organisation: hat off to the organisers and everyone involved, and see you at the PyData London meetup!

Get in touch if you also have a write up of the event:

@MarcoBonzanini

PyCon Italia / PyData Italy 2016 Write-Up

Last week I’ve travelled to Florence to attend PyCon Sette, the seventh edition of the Italian Python Conference, born 10 years ago and held annually (with three editions of EuroPython in between).

First off, I have something to admit: as this was my first time at PyCon Italia, clearly I didn’n know what I was missing. Being overly busy with work and side projects, this is the perfect excuse to resume the blog.

Florence

The city doesn’t need much presentation: it’s simply one of the most beautiful cities in the world. I haven’t been there for a few years but things don’t seem to be very different from a turist’s point of view. The craft beer scene is booming, but at the same time culinary traditions are well preserved. Both of these are big thumbs-up for me. The best random moment of my trip: getting lost in the back streets of the old city centre, and then finding a dodgy hole-in-the-wall place that sells incredible focaccia and panini.

The Conference

PyCon Sette can be summarised as three intense days of Python, with more than 500 attendees. The first day was opened by Alex Martelli with a keynote about exception handling in Python 2 vs Python 3. A part from the keynotes, at any given time we had between 4 and 6 parallel sessions of talks or trainings. I decided to stick to the PyData track for the whole time, although the other tracks were also featuring some interesting talks. Some of the tracks were related to a particular sub-community, with PyData and DjangoVillage having a strong presence, but also Odoo, DjangoGirls and the Italian Postgres User Group are worth mentioning.

I’ve listened to many interesting talks. On top of my head, a few to remember: the talk about Internet of Things by Stefano Terna of TomorrowData.io (also winners of the start-up contest), the one about deployment of scikit-learn models in the cloud by Alex Casalboni and an interesting one about Functional Programming and Dask by Holger Peters.

Overall, hats off to the organisers. In particular, I had some conversations with Valerio Maggio who is the founder of PyData Italy. We exchanged some opinions about the conference and the community in the broader sense. Hopefully the interest around Data Science in Italy will keep rising, so maybe several local events throughout the year will be held, rather than having just one big national event per year.

My Talk

On Saturday, I gave a talk on Building Data Pipelines in Python. I wrote about building data pipelines with Luigi before, but this talk gave me the opportunity to look at the bigger picture. The general message was that Research and Engineering are different disciplines, but we (data-sciency and researchy people) can benefit from trying to meet in the middle. In particular, good engineering practices can help the less engineering-oriented researchers in their day-to-day mundane tasks. After opening the discussion on the overall topic, I had a brief moment of ranting about unit testing (or the lack of testing culture in some academic circles), I introduced Luigi as a workflow manager to build pipelines in Python and I closed with an overview on logging (described by Alex Martelli in his keynote as something that scares people off, at least initially) and a consideration about using good engineering practices in research.

The talk was addressed to beginners and to the less engineering-savvy PyData users, so expert software engineers probably didn’t benefit much from it. I had anyway a good response with several people coming after the talk for a chat. All in all, if at least one researcher will look into testing or will decide to try one of the workflow managers I mentioned, I’d say I’ve reached my goal.

The slides of my talk are on my speakerdeck (videos will be on-line soon).

See you next year in Florence!

Retrocomputing and Python: import turtle

My first experience with something related to programming was back in middle school. From time to time, our Math-and-Science substitute teacher used to walk us to the computer room, which was full of shiny Commodore 64 machines, where we had a lot of fun (sort of) with a graphic tool called turtle. What we were trying to do was simply to give a list of instructions to a turtle-shaped cursor, so it could move on the screen and draw some colourful shapes.

Back in those days, we didn’t even realise that we were doing something programming-related, we simply thought we were skipping Math for one day. Fast-forward a few years later, I found out about Logo, its value as educational programming language and Turtle graphics as one of Logo’s key features.

Given the festive spirit of these days, I thought I’d give a shot at the turtle package, part of the Python standard library ;)

Quick Intro on Turtle Graphics

Python has its own implementation of the turtle as part of the standard library (see documentation here). It uses the tkinter module for the underlying graphics, so it has to be run with a version of Python with Tk support.

If you’ve never heard of Turtle Graphics, these are some of the core concepts:

  • The turtle has a position (x, y coordinates) and an orientation
  • The orientation can be changed with right/left commands, e.g. right(90) will rotate 90 degrees clockwise
  • The position can be changed with forward/backward commands, or by setting the coordinates explicitly
  • The turtle is also called pen: when the pen is down, moving the turtle will draw a line

import turtle

The starting point is simply to import the turtle module. A turtle program will have a turtle.Screen object as a drawing canvas, and a turtle.Turtle object as a pen.

Let’s consider this first example:

import turtle

if __name__ == '__main__':
    win = turtle.Screen()

    turt = turtle.Turtle()
    turt.forward(100)
    turt.left(90)
    turt.forward(30)
    turt.color("red")
    turt.forward(30)

    win.mainloop()

This will produce the following:

Turtle Example

The turtle is initially oriented towards the right-hand side of the screen, i.e. towards 3 o’clock. Moving forward will produce the initial black line. As you can see the colour can be changed later using the turtle.color() function.

Festive Turtle

This paragraph shows a more complex example. The full code is available on GitHub

import turtle

if __name__ == '__main__':
    wn = turtle.Screen()

    my_turtle = turtle.Turtle()

    # start drawing the tree
    my_turtle.color("darkgreen")
    my_turtle.pensize(5)
    my_turtle.begin_fill()
    # the right half of the tree
    my_turtle.forward(100)
    my_turtle.left(150)
    my_turtle.forward(90)
    my_turtle.right(150)
    my_turtle.forward(60)
    my_turtle.left(150)
    my_turtle.forward(60)
    my_turtle.right(150)
    my_turtle.forward(40)
    my_turtle.left(150)
    my_turtle.forward(100)
    # the left half of the tree
    my_turtle.left(60)
    my_turtle.forward(100)
    my_turtle.left(150)
    my_turtle.forward(40)
    my_turtle.right(150)
    my_turtle.forward(60)
    my_turtle.left(150)
    my_turtle.forward(60)
    my_turtle.right(150)
    my_turtle.forward(90)
    my_turtle.left(150)
    my_turtle.forward(133)

    my_turtle.end_fill()
    # the trunk
    my_turtle.color("brown")
    my_turtle.pensize(1)
    my_turtle.begin_fill()

    my_turtle.right(90)
    my_turtle.forward(70)
    my_turtle.right(90)
    my_turtle.forward(33)
    my_turtle.right(90)
    my_turtle.forward(70)

    my_turtle.end_fill()

    # the star, see similar example on python.org
    my_turtle.penup()
    my_turtle.setpos(-17, 110)
    my_turtle.color("gold")
    my_turtle.begin_fill()
    my_turtle.pendown()
    for _ in range(36):
        my_turtle.forward(40)
        my_turtle.left(170)
    my_turtle.end_fill()


    # some colourful balls
    def ball(trt, x, y, size=10, colour="red"):
        trt.penup()
        trt.setpos(x, y)
        trt.color(colour)
        trt.begin_fill()
        trt.pendown()
        trt.circle(size)
        trt.end_fill()

    ball(my_turtle, 95, -5)
    ball(my_turtle, -110, -5)
    ball(my_turtle, 80, 40, size=7, colour="gold")
    ball(my_turtle, -98, 40, size=7, colour="gold")
    ball(my_turtle, 70, 70, size=5)
    ball(my_turtle, -93, 70, size=5)


    my_turtle.hideturtle()
    wn.mainloop()

And this is the output:

Turtle XMas Tree

Summary

Turtle graphics is a great educational tool to introduce kids to programming. Grown-ups can use it as well, for a bit of nostalgic fun ;)

The full code for the demo is available on GitHub

@MarcoBonzanini

Adding Slack Notifications to a Luigi Pipeline in Python

In a previous article, I’ve described how to build a data pipeline in Python using Luigi, a workflow manager written in Python and open sourced by Spotify. I also had the opportunity to give a short talk about Luigi at the local PyData London meetup (see slides).

One of the nice features of Luigi is the possibility of receiving e-mail notifications on error. While this is a useful feature, it’s tailored to errors only, so effectively you don’t know if the Luigi pipeline has completed its execution successfully, unless you manually check. As I wanted to extend the possibility of receiving a notification on Slack, also in case of success, I started looking around for the options.

I ended up developing my own solution: https://github.com/bonzanini/luigi-slack. This blog post is a brief overview on how to use this Python package with your Luigi pipeline.

Getting started with luigi-slack

From your organisation’s Slack page (e.g. yourname.slack.com) you can add a Bot integration. The setup is very quick, and you’ll receive a token that you’ll need to use to interact with the Slack API.

You can get the bleeding edge version of luigi-slackfrom the GitHub link above, but beware that this is a work in progress. A somewhat stable version is available from the cheese shop:

pip install luigi-slack

The key points of this package are:

  • Support for Python 3
  • Easy-to-use interface

Regarding the first point, the discussion on choosing Python 2 vs Python 3 is still never-ending and I’m not going there in this post. For a greenfield project, I prefer to use Python 3 rather than a version with a sunset date already decided. The support for Python 2 in luigi-slack is best-effort (and of course pull requests are always welcome).

In terms of easy-to-use interface, I borrowed the nice idea of using a context manager from luigi-monitor, because it makes it easy to integrate the library with an existing pipeline.

For example, given the basic code to run a Luigi pipeline which ends with the task YourTaskClass:

import luigi

if __name__ == '__main__':
    luigi.run(main_task_cls=YourTaskClass)

All we need in order to have Slack notifications is to refactor as follows:

import luigi
from luigi_slack import SlackBot, notify

if __name__ == '__main__':
    slacker = SlackBot(token='my-token',
                       channels=['mychannel', 'anotherchannel'])
    with notify(slacker):
        luigi.run(main_task_cls=YourTaskClass)

Configuration Options for luigi-slack

The SlackBot takes a number of arguments. Besides the token, which allows you to connect to your organisation’s Slack, all the other parameters are optional:

  • channels (default empty list) is the list of channel names that you want to push the notifications to. For the channel name, you don’t need the initial # symbol. You can also deliver the notifications to a single account, by using the @username syntax
  • events (default to [FAILURE]) is the list of event types, as defined in luigi_slack, that you want to track
  • max_events (default to 5) is the max number of events of a given type that you want to report. With more than max_events events of the same type, a “please check logs” message is reported instead
  • username (default to “Luigi-slack Bot”) is the screen name of your bot
  • task_representation (default to str) is the function used to represent the task in the notification (see explanation below)

In Luigi, representing a task as a string will print the task_id attribute of a luigi.Task, which include the class name as well as all the parameters. In other words, it looks like:

MyTask(param1=”some_value”, param2=”other_value”, your_secret_param=”your_secret_value”, …)

With a huge number of parameters that make the notification less readable, or with sensible parameters that you don’t want to send around in the Slack chat room, it makes sense to display the task a more conservative way. An example of custom string representation could be:

def custom_task_representation(task):
    return "{}(...)".format(task.__class__.__name__)

Once we pass the function as task_representation argument of the SlackBot, the task will appear in the notifications as:

MyTask(…)

Keep in mind that an instance of a Luigi task is identified by the class name AND the value of its parameters, which is why the task_id include them all. In other words, with a more compact representation like the one proposed in the above snippet, you won’t be able to distinguish between tasks with the same class name, but different param values. You’ll need to customise the function based on your needs.

Summary

I’m developing a Python package to include Slack notification support to a Luigi pipeline, with a simple interface, a few optional configuration parameters, and minimal requirements in terms of refactoring.

The code is available at https://github.com/bonzanini/luigi-slack, and you can install the Python package with:

pip install luigi_slack

As this is a work in progress, it’s not widely tested, and the interface could change. Comments and pull requests are welcome.