Day 1: Introduction to Python and Data Processing
Fundamentals of programming in Python
Working with data in Pandas
Data cleaning, filtering and transforming
Manipulating dates and strings
Groupby and aggregations, multi-index
Long and wide data formats
Data import and export, linking with Excel
Day 2: Data Visualisation
Fundamentals of data visualisation in Python
Matplotlib, Seaborn, and Plotly libraries
Groupby + aggregations and their visualisation
Interactive charts and dashboard elements
Geographical and multivariate charts
Multivariate plots
Principles of data communication
Day 3: Statistics and Regression Modelling
Fundamentals of statistics and probability
Statistical hypothesis testing
Interpretation of stat results and p-values
Linear and logistic regression
Metrics of predictive power
Correlation, causality and randomisation
Natural experiments
Statistics vs. machine learning
Day 4: Machine Learning
AI from the ground up: concepts, types and uses
Training machine learning models
Classification and regression models
Sensitivity, specificity, ROC curve
Interpretation of ML model decision-making
Neural networks and deep learning
Unsupervised learning, t-SNE
Integration of LLMs into Python projects
Extraction of structured data from text
Day 5: Data Hackathon!
In cooperation with our partners, we have prepared challenging data tasks using data from the healthcare and education sectors. The goal of the hackathon is for every participant to utilise their new data and programming skills directly in practice and at the same time, learn something new about important social topics. Several teams already came up with interesting findings in both tasks, which provided new insights for stakeholders.