Max Planck Gesellschaft
Max Planck Gesellschaft

IMPRS-gBGC course 'Applied statistics & data analysis' 2020, Advanced

Category: Skill course
0.2 CP per course day

1.  Advanced statistics

1.1  Organizational issues

Date: November 16 - 20, 2020
Place: lecture room @ MPI-BGC (depending on COVID-19 regulations)
Planned sessions:

  • 09:00 - 09:45 lecture
  • 09:45 - 10:00 break
  • 10:00 - 11:00 talks
  • 11:00 - 11:15 break
  • 11:15 - 12:00 excusion
  • 12:00 - 13:00 lunch
  • 13:00 - 14:00 talks
  • 14:00 - 14:15 break
  • 14:15 - 15:00 lecture
  • 15:00 - 17:00 practical part


1.2  Aims and scope

The course will cover selected topics of advanced statistics and machine learning. Lectures on some topics will be accompanied with presentations by participants, “Excursion” talks on applications in research, and basic practicals in the afternoon. The course requires basic knowledge of statistics. The practical session require basic knowledge with a programming language – examples will be provided in R.

1.3  Presentations by participants (mandatory for assignment)

Participants will give a presentation (20min + 10min Q&A) on a paper or topic of their choice. Below you can find a list of suggested papers. If you want to work on a topic in a team of 2 (i.e. 40min+20min Q&A) or suggest an alternative topic please inquire this until 31st October with the proposed topic to

During registration please choose a topic that was not yet chosen.

All presentations need to be ready on Monday 16th Nov 2020 at 9 am. The detailed schedule will be announced then.

The presentations should be educational and try to focus on the important things one should know about a method when applying it, i.e. the principle, advantages, disadvantages, assumptions, and pitfalls, rather than all mathematic details, derivations, theorems and proofs. Practical examples are often very illustrative.

1.4  Other Preparations

Bring a laptop with a recent version of R being installed or running for the practicals. If you prefer another language, that is fine but we will not provide corresponding code examples. Please also make sure that you can access the internet via WLAN (BGC-users, if you have a BGC-account; BGC-guests, if you don't have an account).

1.5  Preliminary agenda

Day Topic
Monday, November 16

Introduction to basic statistical tools

  • Introduction
  • Dimensionality reduction
  • Circular statiscs
Tuesday, November 17

Time series analysis

  • Mixed effects models
Wednesday, November 18

Random Forests

  • Model evaluation
Thursday, November 19

Structural euqation modelling

  • Variable Importance
Friday, November 20

Parameter estimation

1.6  Material

Here, you can download the papers.

1.7  Interested?


  • Basic knowledge of a language of scientific computing: R, Matlab
  • Make use of the R course - The basics
  • Either the course 'Basic statistics' or recalling the typical “statistics 1” type of lectures from university.

Exercises will be in R – the use of any other language is welcome; however support depends on the person in charge and cannot be guaranteed.

Learn R… Here is a list of useful online resources to help you bring your R skills to a new level.
The material from the R basics course might also be useful for you.

The course can be a 'stand-alone' (separate certificate) if you have a solid background in basic statistics. You can brush up your skills with the course 'Basic statistics'. Register here.

1.8  Requirements for the assignment

All participants have to prepare a short presentation on one "unconventional" method of their choice: Every day will have a few of these presentations and we want to discuss with you about the pros and cons: Please register for one of the following topics (but feel free to add another one).


  • Don’t choose a technique that you know already!
  • Check the list of participants below and choose a topic that has not yet been selected. Ideally, we would like to cover all topics.

....and note that we are not necessarily experts in the methods.

# / NAME OF PRESENTER Topic Context
1 Archetypal Analysis Multivariate data representation
2 / ANN-SOPHIE LEHNERT A working guide to boosted regression trees non parametric regression
3 /GÖKBEM DEMIR From outliers to prototypes: Ordering data novelty/outlier detection
4 / SANTIAGO BOTIA Long Short-Term Memory neural networks for time series
5 Calibration of process-oriented models model calibration and evaluation
6 / SOPHIE VON FROMM Deep learning deep learning overview
7 / CAGLAR KUCUK A unified approach to interpreting model predictions variable importance, explainable AI
8 Quantile regression forestsa random forest, quantile regression
9 Deep learning and process understanding for data-driven Earth system science deep learning and hybrid modeling for Earth System Science
10 / SINIKKA PAULUS Cross-validation strategies for data with temporal, spatial, hierarchical, or phylogenetic structure model evaluation
11 MissForest—non-parametric missing value imputation for mixed-type data random forests, data imputation (filling missing data)
12 Bias in random forest variable importance measures: Illustrations, sources and a solution random forest, variable importance
13 Measuring and Testing Dependence by Correlation of Distances non-linear correlation
14 Visualizing Data using t-SNE dimensionality reduction, multivariate data visualization
15 The energy of data non-parametric statistics based on distances
16 Summarizing multiple aspects of model performance in a single diagram model evaluation
17 Decomposition of the mean squared error and NSE performance criteria: Implications for improving hydrological modelling model evaluation and calibration
18 Isolation Forest random forest, novelty/outlier detection
19  Bootstrap methods for standard errors, confidence intervals, and other measures of statistical accuracy uncertainty
20 A comparison of techniques for the estimation of model prediction uncertainty uncertainty
21 / EKATERINA BOGDANOVICH Verification, validation, and confirmation of numerical models in the earth sciences model evaluation and calibration
22 Hierarchical Clustering via Joint Between-Within Distances: Extending Ward's Minimum Variance Method clustering
23 Locally Weighted Regression: An Approach to Regression Analysis by Local Fitting smoothing
24 Toward the true near‐surface wind speed: Error modeling and calibration using triple collocation uncertainty

2.  Registration

Please register here by October 9, 2020.

3.  Participants

COVID-19 update (August 3, 2020): the current infection protection concept states that 16 persons are allowed to be inside the lecture hall without wearing a face mask. Due to the this number of participants is now limited to 15 (plus 1 lecturer).

This page was last modified on August 06, 2020, at 03:20 PM

Directions | Disclaimer | Data Protection | Contact | Internal | Webmail | Local weather | PRINT | © 2011-2020 Max Planck Institute for Biogeochemistry