Getting into Data Science presentation at Hisar Coding Summit 2021

Last week I had the opportunity to speak at the Hisar Coding Summit 2021, an event organised by students of the Hisar School of Istanbul. The remote format opened the doors for participants around the globe, but the audience was mainly high-school students with an interest in Data Science.

The title of my presentation, Getting into Data Science, already suggests the core topics: an overview on what Data Science is, some examples of interesting applications and an overview on the type of skill that are useful for a data scientist.

Link to the slides

Some of the core points I discussed:

  • Data Science is a three-way handshake between Computer Science, Statistics and Business Domain Knowledge. There are many Data Science Venn diagrams out there, with Drew Conway’s (that I referenced in the presentation) probably being the most recognised. One aspect of the Venn diagram representation that I don’t fully like is that it can be misread, suggesting that Data Scientists only exist within the intersection of the three disciplines (hence the representation of Data Scientists as Unicorns). The point of the three-way handshake is to suggest that Data Scientists’ skills can be found on the union, rather than the intersection, of such diagram (i.e. you don’t have to be an expert at everything to do Data Science).
  • “Data Scientist” is not the only job title in Data Science, there are plenty of other professionals whose skills are crucial for the positive outcome of a Data Science project, including e.g. data engineers, software engineers, business analysts, etc. There is also still some fuzziness on what data scientists are supposed to do, so different organisations will use job titles differently. I used Monica Rogati’s Data Science Hierarchy of Needs to illustrate this point, explaining how often people and companies tend to focus too much on the cool stuff at the top of the pyramid (AI and Deep Learning), which can only be achieved with solid foundations (reliable infrastructure and data handling) at the bottom of the pyramid.
  • Data Science Applications: too many to mention. One of the great aspects of Data Science is that it lets you work in almost any domain. Or if you prefer, any domain can benefit from Data Science. I offered some examples making references to previous presentations given at PyData London (either the meetup or our annual conference), citing Weather Forecasting, Healthcare, Biology, Journalism, Food Recommendations and more.
  • Data Science Skills: I used again the three-way handshake to discuss how the skills of a Data Scientist will lie somewhere on the union of Computer Science, Mathematic/Statistics and Domain Knowledge. We discussed this in terms of where to start and where to go next, rather than must-have. I find a lot of those “top 10 must-have Data Science skills” articles out there to be silly at best, or damaging at worst, because again they instil the notion that you have to master multiple disciplines before you can even start, intimidating people rather than encouraging them into Data Science. I hope my main message, “you don’t have to be an expert at everything to do Data Science”, came through and left the students eager to learn more.

The session was wrapped up with a few excellent questions (that I did not expect from high-school students!) ranging from data privacy, to ethics, to queries on state of the art computer vision and NLP.

Published by

Marco

Data Scientist