PyCon UK

Registration Office

Get to grips with pandas and scikit-learn

Step by step Machine Learning project in Python

Sandrine Pataut

Sunday 16th, 16:30 (Room J)
Sunday 16th, 14:30 (Room J)


A workshop (3 hours)

We hear a lot about Machine Learning, but it’s just one part of a bigger process. Before applying any algorithm to a data set, discovery and preparation are needed. This hands-on workshop will cover an end-to-end classification project, from importing the data to evaluating a model performance. After this tutorial, you will have completed an entire step by step Machine Learning workflow.

Part one: Grab your spade and dig in!
Pandas is a popular tool that will allow us to efficiently conduct Exploratory Data Analysis. After loading the data set we’ll use in this workshop, we’ll have a first look at it with Pandas and start cleaning it. We’ll also use visualisation to gain more insights and continue to prepare our data.

Part two: Where the Ma(th)gic happen.
In this part, we’ll introduce the powerful scikit-learn library. We'll split the data into training and testing sets and start pre-processing. Then we’ll choose, tune and train a Machine Learning model and finally evaluate its performance using cross-validation and a confusion matrix.

During this workshop, we will fill in a pre-prepared Jupyter notebook together, explaining each step to get a good understanding of the process. You will also have a guided exercise notebook to reinforce your learning on unseen data.

To get the most out of this workshop you will need Python 3, pandas, matplotlib, scikit-learn and jupyter installed. Please refer to the documentation of your operating system of choice or search on the Internet how to install the packages.

Clone the workshop repository as we will use it during the session.


  • The speaker suggested this session is suitable for new programmers.
  • The speaker suggested this session is suitable for data scientists.

Back to schedule