PyData London 2016 write-up

Last weekend I was at the PyData London conference for three Pythonic days. Firstly, thanks to the organiser, volunteers, speakers, sponsors and everyone who has contributed in a way or another to make the event a great success.

This year I had the opportunity to contribute as member of the review committee, which means I had a glimpse at the behind-the-scenes and I know how many great proposals we had. With three days and three to four tracks running in parallel, there is room for a lot of Pythonic parley, yet unfortunately many good proposals had to be turned down due to time/space constraints. The programme turned out to be great nevertheless.

The three days were really intense so there is just too much to say, but I’ll try to summarise some of the take-home messages.

Tutorials: delivering a tutorial is difficult. Everything that could go wrong, will go wrong (big screen that goes bananas for 10 minutes, flaky Internet connection so a conda install takes ages, you mention it). Jupyter notebook makes life better, but I strongly feel for the speakers, so a big thank you for taking the time to prepare some quality material.

Topics of interest: some topics seem to capture most of the attention this year, in particular there was a lot of interest around data pipelines, deep learning and Bayesian stats. Unsurprising?

Keynotes: following the recent news on the LIGO project, Prof. Andreas Freise gave an introduction to gravitational waves, lasers, the latest achievements in physics and other cool things far beyond my understanding. Something I could understand and relate to is his way to describe how he needs to write code to carry on his job, but writing code is not his main job. This is true for many academics and researchers without a software engineering background, who were also the main audience of my talk on building data pipelines (luckily enough, scheduled right after the keynote in the same room).

The second keynote, given by Tetiana Ivanova, was about the beginning of her journey in Data Science without formal education. Some of the suggestions were sensible, in fact I recently shared some of the same ideas in a short talk to UCL students and post-docs who want to move to industry.

The third and last keynote was given by Travis Oliphant: CEO of Continuum Analytics, author of NumPy, creator of SciPy, Pythonista since the late 1990’s. His talk was about scaling up and scaling out the PyData stack. Things to watch out for: Numba and Dask. Really exciting stuff going on!

My talk: I presented “Building Data Pipelines in Python”, with a focus on the need to bring R&D and Engineering together, and how basic engineering principles can be beneficial even if your job is not all about writing code. After presenting a very similar talk at PyCon Italy, I found the audience in London to be a bit more on the academic side than I initially thought, which was perfect for my engineering rants. After the usual first few minutes of feeling awkward when speaking publicly, I started my discussion on unit testing and asked how many in the audience write unit tests regularly. Random guy from the audience: “What’s a unit test?”. Thank you kind stranger, you lifted my spirit and the rest of the talk was a breeze.

The slides of my talk are on my speakerdeck.

Last year it took several months to get the videos out, this year only one day! So this is the video of my talk: https://www.youtube.com/watch?v=7NzH1Gx8-4E

I had some interesting questions after the talk and I also had some nice conversations the day after. Apparently, I raised some interest on Luigi, in fact a few people told me how they really had to attend the other talk about using Luigi in production, deliverd by Pete Owlett from Deliveroo, after listening to mine (the room was overflowing so I couldn’t even get close!). There was also some genuine interest on unit testing, and a very interesting question was how to apply it when working with Jupyter notebooks.

Lighting talks: apparently, saving your Jupyter notebooks on git is an issue that is taken very seriously by the community. In fact, three speakers came up with different solutions for the same problem.

Organisation: hat off to the organisers and everyone involved, and see you at the PyData London meetup!

Get in touch if you also have a write up of the event:

@MarcoBonzanini

Some Thoughts on IWCS 2015

Last week I attended the 11th International Conference on Computational Semantics (IWCS 2015). This conference is the bi-yearly meeting of the ACL‘s Special Interest Group on Semantics and this edition was hosted by Queen Mary University’s Computational Linguistics Lab. The topics discussed at the conference revolve around computing, annotating, extracting and representing meaning in natural language. The format of the conference consisted in a first day of workshops (I attended Advances in Distributional Semantics, ADS) followed by three days of main event.

It was nice to be back at Queen Mary, where I studied for my MSc and PhD in Information Retrieval, and it was nice to have the opportunity to mix with a different academic crowd. In fact, a part from the local organisers, I hadn’t met any of the attendees before, and I only knew a couple of famous names. In particular, Hinrich Schuetze (probably best known for co-authoring a book on Natural Language Processing and one on Information Retrieval) gave a talk at the ADS workshop about The case against compositionality, and Yoshua Bengio (one of the most influencial figures in Deep Learning) gave one of the keynote speeches, about Deep Learning of Semantic Representations.

To confirm a feeling that I already had, I have to say that small, single-track conferences are in general more enjoyable than huge ones. You might not have an open bar reception in a 5+ stars fancy hotel, but the networking is much more relaxed, people are in general more approachable, and QA sessions are usually spot on. Of course it really depends on the venues, but my non-statistically significant experience tells me so. Moreover, in bigger venues a lot of attention goes to improving some baseline of some 0.1% accuracy (or whatever metric) without many details on the theoretical foundations (of course with exceptions). Smaller venues usually have the chance to dig deeper into what it is that really makes a model interesting, even when the results are less solid or the evaluation is on a small-ish scale.

Talking about evaluation, this was in my eyes the biggest difference with Information Retrieval conferences: scalability and large-scale evaluation have been rarely, if ever, mentioned. I understand that other venues like EMNLP are probably more suitable for these topics, but it was something that I noticed.

In general, it’s difficult to mention one particular talk, as they were all more or less interesting in my eyes, but one quote that stood out for me was an answer given by Prof. Bengio at the end of his keynote, regarding negation and quantification, and how a Neural Network model deals with them: “I don’t know. But it learns to do what it needs to do.”

As a final side-note, the social event/dinner was a boat trip on the Thames: looking at some well-known London landmarks from a different point of view was absolutely amazing. Well done to the organisers!

PyData London Meetup 2015-02-03

A quick update with my impressions on the last PyData London meet-up I attended this evening.

I missed the past couple of meet-ups because of some clash on my personal schedule, so this was my first time in the new venue, still close to Old Street, hosted by Lyst. A nice big room to accomodate many many people (more than 200 probably?), and a nice selection of craft beers.

We started with the usual initial community-related announcements, including the impressive achievement of the group: 1,000+ members on the meetup.com dedicated page! Well done guys!

The core topic of the evening was data visualisation. For this reason, the main candidate for Python Module Of The Month was (… drum roll …) matplotlib :-D

The first talk, “Thinking about data visualisation”, was given by Andy Kirk from Visualising Data. Thinking was an important keyword of the presentation: after an initial comparison between talent (e.g. artistic, creative) vs thinking, Andy went on discussing different aspects of thinking and how our thinking can provide more value to the visualisation we are proposing. Long story short, you don’t have to be an artist to create useful visualisation, assuming you take the context and the overall story into account.

The second talk, “Lies, damned lies and dataviz”, was given by Andrew Clegg from Etsy. Andrew started the presentation with some argument in support of providing good dataviz, and then he continued with a great sequence of examples of bad dataviz, discussing how some of them simply confuse the user without providing any additional insight, while others really trick the user into wrong conclusions. Whether these are damned lies or just bad design choices… well that’s another story.

Both presentations were really great, and both of them technology-agnostic, which is something I enjoyed for some reason.

During the final lightning-and-community-talks, the interesting news was the announcement of a new O’Reilly book on data visualisation with Python and Javascript (link to come).

Overall a very nice evening, congratulations to all the organisers.

PyData London meetup.com group:
http://www.meetup.com/PyData-London-Meetup/