Introduction to machine learning and data analysis (2018)

ECTS credits: 3

Course parameters:
Language: English
Level of course: PhD
Time of year: 6 September – 15 November 2018
No. of contact hours/hours in total incl. preparation, assignment(s) or the like: 17 hours lectures, total workload including preparation, reading and assignments 65 hours
Capacity limits: 10 participants

Objectives of the course:
The course will cover various data analysis methods in the scope of machine learning. The goal is to give a solid introduction to machine learning methods. The objective is not to make you an expert but to show you what tools are available in the field. After the course, the students should be able to know in which direction to look when exposed to a data analysis problem. In the lectures, the mathematical and algorithmic details of each approach will be given in a clear and simple fashion.

The lectures will be organized as:

  1. Introduction
    Linear regression, model assessment and selection (cross-validation, effective number of parameter, bias-variance tradeoff).
  2. Lagrange multiplier
    This method allows introduction of constraints (physical, biological or others) in your model. This is the base for more refined linear regression seen in part 3.
  3. Linear regression upgraded
    Lasso regression, Ridge regression and model tailored regression. Concrete interpretation and consequences of these models on real data.
  4. Elements of Probability
    This chapter is necessary for progressing to chapter 5. In this lecture, we go from basic probability to the log-likelihood.
  5. Linear classification
    Logistic regression and linear discriminant analysis.
  6. Parameter inference/optimization
    Inference of parameters of different distribution models (Gaussian mixture, t-distribution or optimization problems). We will see inference using maximum likelihood, Monte Carlo and gradient descent.
  7. Non-linear regression
    Spline method, non-parametric logistic regression.
  8. Dimensionality reduction
    Principal component analysis.
  9. Clustering
    Basic clustering method (k-means, Gaussian mixture).
  10. Times series
    Auto-correlation, cross-correlation and some element of stochastic process (random walk).
  11. Numerical considerations
    Introduction about stability of numerical method, conditioning and some hardware knowledge.

Learning outcomes and competences:
At the end of the course, the student should be able to:

  • Analyze and criticize mathematical models for data analysis
  • Select the relevant model for the data context
  • Understand scientific implications of machine learning
  • Be aware of numerical considerations
  • Design a methodology for scientific interpretation

Compulsory programme:

  • Attendance in lectures
  • Completion of exercise assignments

Course contents:
As described above

Basic knowledge of calculus

Name of lecturer:
Postdoc Madeny Belkhiri, DANDRITE, Department of Molecular Biology and Genetics (lecturer)

Type of course/teaching methods:
Lectures, seminars, and exercises


  • Trevor Hastie, Robert Tibshirani, Jerome Friedman: The elements of statistical learning
  • Walter Appel: Mathématiques  pour la physique
  • B. Guenin, J.Köneman, L. Tunçel: A gentle introduction to optimization
  • Arfken, Weber and Harris: Mathematical Methods for Physicists

Course homepage:

Course assessment:
Active participation

Department of Molecular Biology and Genetics, Aarhus University

Special comments on this course:
This course should be useful for people who deal with data analysis/machine learning without having the proper formation.

To be announced

To be announced

Deadline for registration is 5 September 2018.

For registration of if you have any questions, please contact Madeny Belkhiri, e-mail: