Statistical models for genomic prediction in animals and plants (2018)

ECTS credits:
3

Course parameters:
Language: English
Level of course: PhD course (also available for M.Sc. students – see comment)
Time of year: Summer 2018
No. of contact hours/hours in total incl. preparation, assignment(s) or the like: 35/80
Capacity limits: 15 students


Objectives of the course:
The course focuses on the quantitative genetics and statistical background of different genomic prediction models, covering also estimation of variance components, theory on genomic heritabilities, Bayesian statistics, estimation of hyper parameters in Bayesian models, multitrait models and simple genomic feature models. Use of all models will be trained in computer practicals with the objective that students obtain an understanding of the statistical principles of the different models, and can analyse data with a critical assessement of the results from different statistical approaches.


Learning outcomes and competences:
At the end of the course, the student should be able to:

  • describe the common uses of genomic prediction in animal and plant breeding
  • analyse and discuss the statistical problems arising with large sets of predictors and common ways to handle these problems
  • structure and explain strengths and weaknesses of various statistical and computational tools to build prediction models from high dimensional data
  • apply software tools for mixed models, ridge regression, LASSO and Bayesian MCMC methods
  • perform cross validation studies and assess predictive ability of models by prediction correlation and accuracy
  • explain and evaluate consequences of the data and population factors affecting predictive ability
  • apply prediction tools in an empirical data set


Compulsory programme:

  • A set of key papers (approx. 5) is distributed that students are expected to study as preparation
  • 5 full days of lectures, computer exercises, and review / discussion of the exercises. Students should actively participate in the discussions, after each practical a few students will be asked to present their results, which will be discussed with the other students.


Course contents:
Teaching sessions are scheduled for 5 days:

  • Day 1: Background on genomic prediction and genomic selection in animals and plant and relevance of generation interval and accuracy in breeding programs; comparison to classical approaches (QTL mapping, MAS); simple approaches using GWAS results. Simple mixed model (SNP-BLUP aka rrBLUP) for whole-genome prediction.
  • Day 2: Tackling large p-small n using random/shrinkage effects and cross-validation. CV using split data, x-fold, leave-one-out and cross validation across families; explained variance in training and test data, accuracy and bias of predictions. Building of the G-matrix and the GBLUP model; comparison of SNP-BLUP and GBLUP variance components and predictions. Afternoon: preparation of literature presentation (in pairs).
  • Day 3: Morning: Literature review by students. Afternoon theory: Different scaling methods for G-matrices (Van Raden method 1,2,3,4), scaling and interpretation of relationships and inbreeding in the G-matrix. Single step GBLUP and combining the A and G-matrix and scaling of A and G matrices. General introduction to Bayesian statistics.
  • Day 4: Bayesian shrinkage models: BayesA and LASSO and their hyper parameters; Bayesian variable selection models and their hyper parameters. Background on implementation of Bayesian methods using MCMC and MCMC post-analysis and convergence assessment.
  • Day 5: Morning: Presentations by student on some exercise results (MSc students add their report outline). Afternoon: theory on genomic heritability and effects of relationships in populations; impact of relationships on predictions and comparison of methods with strong and weak relationships. Multitrait Bayesian models and simple genomic feature models with variance components and predictions by chromosome.


Prerequisites:
Background in linear models (regression, multiple regression) and preferably in mixed models (random effects, variance components).


Name of lecturer:
Luc Janss


Type of course/teaching methods:
Lectures, computer exercises, literature review and presentations by students


Literature:
Approx. 5 key papers and class notes.


Course homepage:
None


Course assessment:
Assessment is based on presentations and active participation in the discussion of the exercises. PhDs receive a course certificate when successfully completing the course.


Provider:
Department of Molecular Biology and Genetics, Aarhus University


Special comments on this course:
The course is also available as a 5 ECTS M.Sc. course with a higher workload (adding a project report and exam). M.Sc. students get a certificate with a grade in the Danish 12-point scale.


Time:
30 July – 3 August 2018


Place:
AU central campus, MBG Forskerparken, Gustav Wiedsvej 10 (exact location may change dependent on number of students and room size needed).


Registration:
Registration until 20 July 2018 by sending an email to the course coordinator Luc Janss (luc.janss@mbg.au.dk).