Getting into Data Science presentation at Hisar Coding Summit 2021

Last week I had the opportunity to speak at the Hisar Coding Summit 2021, an event organised by students of the Hisar School of Istanbul. The remote format opened the doors for participants around the globe, but the audience was mainly high-school students with an interest in Data Science.

The title of my presentation, Getting into Data Science, already suggests the core topics: an overview on what Data Science is, some examples of interesting applications and an overview on the type of skill that are useful for a data scientist.

Link to the slides

Some of the core points I discussed:

  • Data Science is a three-way handshake between Computer Science, Statistics and Business Domain Knowledge. There are many Data Science Venn diagrams out there, with Drew Conway’s (that I referenced in the presentation) probably being the most recognised. One aspect of the Venn diagram representation that I don’t fully like is that it can be misread, suggesting that Data Scientists only exist within the intersection of the three disciplines (hence the representation of Data Scientists as Unicorns). The point of the three-way handshake is to suggest that Data Scientists’ skills can be found on the union, rather than the intersection, of such diagram (i.e. you don’t have to be an expert at everything to do Data Science).
  • “Data Scientist” is not the only job title in Data Science, there are plenty of other professionals whose skills are crucial for the positive outcome of a Data Science project, including e.g. data engineers, software engineers, business analysts, etc. There is also still some fuzziness on what data scientists are supposed to do, so different organisations will use job titles differently. I used Monica Rogati’s Data Science Hierarchy of Needs to illustrate this point, explaining how often people and companies tend to focus too much on the cool stuff at the top of the pyramid (AI and Deep Learning), which can only be achieved with solid foundations (reliable infrastructure and data handling) at the bottom of the pyramid.
  • Data Science Applications: too many to mention. One of the great aspects of Data Science is that it lets you work in almost any domain. Or if you prefer, any domain can benefit from Data Science. I offered some examples making references to previous presentations given at PyData London (either the meetup or our annual conference), citing Weather Forecasting, Healthcare, Biology, Journalism, Food Recommendations and more.
  • Data Science Skills: I used again the three-way handshake to discuss how the skills of a Data Scientist will lie somewhere on the union of Computer Science, Mathematic/Statistics and Domain Knowledge. We discussed this in terms of where to start and where to go next, rather than must-have. I find a lot of those “top 10 must-have Data Science skills” articles out there to be silly at best, or damaging at worst, because again they instil the notion that you have to master multiple disciplines before you can even start, intimidating people rather than encouraging them into Data Science. I hope my main message, “you don’t have to be an expert at everything to do Data Science”, came through and left the students eager to learn more.

The session was wrapped up with a few excellent questions (that I did not expect from high-school students!) ranging from data privacy, to ethics, to queries on state of the art computer vision and NLP.

Video Course: Practical Python Data Science Techniques

I’m happy to announce the recent release of my second video course,
Practical Python Data Science Techniques published with Packt Publishing.

VideoCourse-Cover

Links:

This video course follows my first introductory course (Data Analysis with Python) and provides the audience with recipe-like solutions to common Data Science problems.

In particular, with about 2.5 hours of material, the video course covers the following topics:

  1. Exploring Your Data
    This section covers some of the most common techniques related to loading data, performing exploratory analysis and cleaning your data to get them in the right shape.
  2. Dealing with Text
    describes the common pre-processing techniques that you need to deal with text, from tokenisation to normalisation, to calculating word frequencies.
  3. Machine Learning Problems
    describes the most common Machine Learning problems and how to tackle them using scikit-learn.
  4. Time Series and Recommender Systems
    The last section groups some miscellanous topics, in particulr Time Series Analysis and the basics to implement a recommender system.

More details about the content of the course are available on the PacktPub’s page, and of course you can check out the code examples on my GitHub (links on top of this page).

If you are a beginner you may also be interested in my other video course, Data Analysis with Python (see video course on PacktPub.com, course material on GitHub and course overview on this blog).

@MarcoBonzanini