Statistical models for genomic prediction in animals and plants

Name of course:

Statistical models for genomic prediction in animals and plants 

ECTS credits:

Course parameters:
Language: English
Level of course: PhD course (also available for MSc’s – see comment).
Time of year: Summer 2017
No. of contact hours/hours in total incl. preparation, assignment(s) or the like: 35/80
Capacity limits: 30 students
Course Fee: 1500 DKK 

Objectives of the course:
The course focuses on the quantitative genetics and statistical background of different genomic prediction models, covering also estimation of variance components, theory on genomic heritabilities, Bayesian statistics, estimation of hyper parameters in Bayesian models, multitrait models and simple genomic feature models. Use of all models will be trained in computer practicals with the objective that students obtain an understanding of the statistical principles of the different models, and can analyse data with a critical assessement of the results from different statistical approaches. 

Learning outcomes and competences:
At the end of the course, the student should be able to:

  • describe the common uses of genomic prediction in animal and plant breeding- analyze and discuss the statistical problems arising with large sets of predictors and common ways to handle these problems
  • structure and explain strengths and weaknesses of various statistical and computational tools to build prediction models from high dimensional data
  • apply software tools for mixed models, ridge regression, LASSO and Bayesian MCMC methods
  • perform cross validation studies and assess predictive ability of models by prediction correlation and accuracy
  • explain and evaluate consequences of the data and population factors affecting predictive ability
  • apply prediction tools in an empirical data set 

Compulsory programme:

  • a set of key papers (approx 5) is distributed that students are expected to study as preparation
  • 5 full days of lectures, computer exercises, and review / discussion of the exercises. Students should actively participate in the discussions, after each practical a few students will be asked to present their results which will be discussed with the other students. 

Course contents:
Teaching sessions are schedule for 5 days:

  • Day 1: background on genomic prediction and genomic selection in animals and plant and relevance of generation interval and accuracy in breeding programs; comparison to classical approaches (QTL mapping, MAS); simple approaches using GWAS results. Simple mixed model (SNP-BLUP aka rrBLUP) for whole-genome prediction. 
  • Day 2: tackling large p-small n using random/shrinkage efects and cross-validation. CV using split data, x-fold, leave-one-out and cross validation across families; explained variance in training and test data, accuracy and bias of predictions. Building of the G-matrix and the GBLUP model; comparison of SNP-BLUP and GBLUP variance components and predictions. Multitrait GBLUP and G-REML.
  • Day 3: Different scaling methods for G-matrices (Van Raden method 1,2,3,4), scaling and interpretation of relationships and inbreeding in the G-matrix. Single step GBLUP and combining the A and G-matrix and scaling of A and G matrices. General introduction to Bayesian statistics.
  • Day 4: Bayesian shrinkage models: BayesA and LASSO and their hyper parameters; Bayesian variable selection models and their hyper parameters. Background on implementation of Bayesian methods using MCMC and MCMC post-analysis and convergence assessment.
  • Day 5: Theory on genomic heritability and effects of relationships in populations; impact of relationships on predictions and comparison of methods with strong and weak relationships. Multitrait Bayesian models and simple genomic feature models with variance components and predictions by chromosome.

Every day is broken up by 1.5-2 hours lecture followed by 1.5-2.5 hours exercise in the morning, and the same scheme is repeated in the afternoon. As part of the practical exercises students will be asked to present their results, which will be discussed with the other students. 

Prerequisites:
Background in linear models (regression, multiple regression) and preferably in mixed models (random effects, variance components). 

Name of lecturers:
Luc Janss, Theo Meuwissen.

Type of course/teaching methods:
Lectures, computer exercises, short presentations by students  

Literature:
Approx. 5 key papers and class notes. 

Course homepage:
None 

Course assessment:
Assessment is based on short presentations during practicals and active participation in the discussion of the exercises. PhDs receive a course certificate when successfully completing the course. 

Provider:
Department of Molecular Biology and Genetics, Aarhus University 

Special comments on this course:
The course is also available as 5 ECTS MSc course with a higher workload (adding a project report and exam). The MSc version of this course will be announced in the AU summer courses for MSc’s.

There is a course fee of 1500 DKK to cover expenses and this fee includes lunch and refreshments during the course days. Accommodation is not included in the course fee and should be arranged separately.

Time:
26 - 30 June 2017  

Place:
Asmildkloster Landbrugsskole, Viborg. Accommodation can be provided.  

Registration:
Deadline for registration is 1 May 2017, payment of the course fee of 1500 DKK must be done at registration.
For registration go to AU-webshop

For questions, please contact course coordinator Luc Janss (luc.janss@mbg.au.dk).