Topic 1: Introduction
This is an introduction to the course with an overview of the topics. We give a brief introduction to machine learning and its different variants.
• Why use machine learning?
• Machine learning basics and terminology
• The biggest challenge in machine learning
• Machine learning frameworks: supervised, semi-supervised, unsupervised and reinforcement learning
Topic 2: Regression
We will make a gentle start with regression. In the regression setting, a machine learning model will need to predict a number.
• The regression setting and its assumptions
• The mean squared error (MSE) and mean absolute error (MAE)
• Outliers in regression
• Linear regression and K-nearest neighbour regression
Topic 3: Classification
In classification, a machine learning model will need to predict a category or class.
• Terminology and basics of classification
• Building classifiers using histograms, nearest mean (nearest medoid) classifier, K-nearest neighbour (KNN) classifier
• The Bayes classifier and the Bayes error
• How to use the KNN classifier in practice
Topic 4: Training Models
Gradient descent is an iterative procedure to train models, such as logistic regression and neural networks.
• The basics of gradient descent
• The three variants of gradient descent: batch, mini-batch and stochastic gradient descent (SGD)
• How to tune gradient descent
• The basics of logistic regression
Topic 5: Overfitting
Overfitting is the problem where a machine learning algorithm performs well on the training set but does not perform well on new and unseen data.
• How to use linear models for nonlinear tasks?
• The bias-variance trade-off and the curse of dimensionality
• How to use learning curves to estimate the amount of data needed
Topic 6: Cross Validation & Regularization
To get a good estimate of the performance of machine learning models, cross validation is an essential technique. This is also important to tune hyperparameters of models. Finally, we discuss regularization, a technique that aims to avoid overfitting.
• Cross validation, model selection and hyperparameter tuning
• Ridge regression
• LASSO regularization and how it’s used for variable selection
Topic 7: Classifier Evaluation
Classifier evaluation delves deeper into the various evaluation metrics for classifiers.
• What a “good” accuracy means (e.g., naïve baselines/dummy classifiers)
• The confusion matrix (false positive, false negative, costs)
• ROC-curves
Topic 8: Support Vector Machines
The support vector machine is a well-known more advanced classification model.
• Basics of the SVM, the margin and the hard-margin SVM
• The soft-margin SVM
• Kernels
Topic 9: Decision Trees
Decision trees are simple and interpretable models that are very user-friendly.
• Basics of decision trees and their terminology
• How to train decision trees with CART
• Overfitting and other pros and cons of decision trees
Topic 10: Final Project
The final project will involve building a machine learning pipeline, including hyperparameter tuning and a careful and fair evaluation, to solve a small practical application, that is the recognition of handwritten digits (MNIST).