You are here: AU PhD  Graduate Schools Science and Technology Courses and how to sign up Scientific courses Analyzing large data sets in ‘R’ – designed for biologists (2016)

Analyzing large data sets in ‘R’ – designed for biologists (2016)

Name of course: Analyzing large data sets in ‘R’ – designed for biologists


ECTS credits: 3


Course parameters:

Language: English

Level of course: PhD course, but open for master students and Post Doc.

Time of year: 11 to 15 January 2016

No. of contact hours/hours in total incl. preparation, assignment(s) or the like: 50 contact hours (Monday to Friday 9 AM to 5 PM and 75 in total inclusive assignment

Capacity limits: 20


Objectives of the course:

To give the student knowledge about how to collect, organize and analyze large date set, with focus on time series, chemo-metrical techniques and advanced regression techniques.

Learning outcomes and competences:

At the end of the course, the student should be able to:

  • Import data into ‘R’ (own data, automatically collected data, external data bases) and organize the data efficiently in data base structure.
  • Perform advanced data treatment, e.g. quality control and transformations.
  • Perform advanced statistical analysis on the data with the use of appropriate statistical tools.
  • Present the results graphically or in other appropriate ways.


Compulsory program:

  1. Read the course introduction literature and prepare a data set (own data, or data provided by the teacher).
  2. Active participation in the course.
  3. Complete a report documenting a data analysis performed during the course on a data set (own data, or data provided by the teacher). 


Course contents:

Introduction to R

 - Importing data

 - Data manipulation

- Effective data visualization using ggplot2

Generalized linear modeling using R

- Main error distribution families (normal, Poisson, binomial, negative-binomial, etc.)

- Model validation and selection

- Variance partitioning

Non-linear modeling

Canonical analyzes

- Principal component analysis (PCA)

- Redundancy Analysis (RDA)

Advanced R (if enough time)




Basic knowledge in statistic and algebra


Name of lecturers:

Philippe Massicotte and guest lectures from Department of Bioscience


Type of course/teaching methods:

Lectures and exercises



  • Zuur, Alain F, Elena N Ieno, Neil Walker, Anatoly A. Saveliev, and Graham M. Smith. 2009. Mixed Effects Models and Extensions in Ecology with R. New York, NY: Springer New York. Doi:10.1007/978-0-387-87458-6.
  • Oksanen, Jari, F Guillaume Blanchet, Roeland Kindt, Pierre Legendre, Peter R Minchin, R B O’Hara, Gavin L Simpson, Peter Solymos, M Henry H Stevens, and Helene Wagner. 2013. “Vegan: Community Ecology Package.”


Course assessment:

Active participation in the class and a completed project report.



Department of Bioscience, Aarhus University



11 to 15 January 2016



Bioscience in Roskilde (Risø Campus)



Deadline for registration is 1 December 2015, however we recommend that you sign up as soon as possible, as PhD students will get admission on a ‘first come – first served base’ and we are limited to 20 students. Information regarding admission will be sent out successively for PhD students when we get your registration. For MSc students we will send out information on December 7 as there is priority for PhD students.

For registration: please send a mail to Charlotte Hviid Nielsen, e-mail:

If you have any questions regarding the content, please contact Philippe Massicotte, e-mail: About registration and administrative issues, please contact Charlotte Hviid Nielsen or Stiig Markager, e-mail:


Comments on content: 
Revised 05.11.2018