Max Planck Gesellschaft
Max Planck Gesellschaft

IMPRS-gBGC course 'Applied statistics & data analysis' 2020, Advanced

Category: Skill course
0.2 CP per course day

1.  Advanced statistics

1.1  Organizational issues

Date: November 16 - 20, 2020
Place: lecture room @ MPI-BGC (depending on COVID-19 regulations)
Planned sessions:

  • 09:00 - 09:45 lecture
  • 09:45 - 10:00 break
  • 10:00 - 11:00 talks
  • 11:00 - 11:15 break
  • 11:15 - 12:00 excusion
  • 12:00 - 13:00 lunch
  • 13:00 - 14:00 talks
  • 14:00 - 14:15 break
  • 14:15 - 15:00 lecture
  • 15:00 - 17:00 practical part

Instructor:


1.2  Aims and scope

The course will cover selected topics of advanced statistics and machine learning. Lectures on some topics will be accompanied with presentations by participants, “Excursion” talks on applications in research, and basic practicals in the afternoon. The course requires basic knowledge of statistics. The practical session require basic knowledge with a programming language – examples will be provided in R.

1.3  Presentations by participants (mandatory for assignment)

Participants will give a presentation (20min + 10min Q&A) on a paper or topic of their choice. Below you can find a list of suggested papers. If you want to work on a topic in a team of 2 (i.e. 40min+20min Q&A) or suggest an alternative topic please inquire this until 31st October with the proposed topic to mjung@bgc-jena.mpg.de.

During registration please choose a topic that was not yet chosen.

All presentations need to be ready on Monday 16th Nov 2020 at 9 am. The detailed schedule will be announced then.

The presentations should be educational and try to focus on the important things one should know about a method when applying it, i.e. the principle, advantages, disadvantages, assumptions, and pitfalls, rather than all mathematic details, derivations, theorems and proofs. Practical examples are often very illustrative.

1.4  Other Preparations

Bring a laptop with a recent version of R being installed or running for the practicals. If you prefer another language, that is fine but we will not provide corresponding code examples. Please also make sure that you can access the internet via WLAN (BGC-users, if you have a BGC-account; BGC-guests, if you don't have an account).

1.5  Preliminary agenda

Day Topic Who
Monday, November 16
9:00 - 09:45 Introduction to basic statistical tools Martin Jung
09:45 - 10:00 Break
10:00 - 10:30 Bootstrap methods for standard errors, confidence intervals, and other measures of statistical accuracy
10:30 - 11:00 Toward the true near‐surface wind speed: Error modeling and calibration using triple collocation
11:00 - 11:15 Break
11:15 - 12:00 Hierarchical Clustering via Joint Between-Within Distances: Extending Ward's Minimum Variance Method
12:00 - 13:00 Lunch Break
13:00 - 13:30 Archetypal Analysis
13:30 - 14:00 Visualizing Data using t-SNE
14:00 - 14:15 Break
14:15 - 15:00 Dimensionality reduction
Mirco Migliavacca
15:00 - 17:00 Practical Mirco Migliavacca
Tuesday, November 17
9:00 - 09:45 Time series analysis Lina Estupinan-Suarez
09:45 - 10:00 Break
10:00 - 10:30 Cross-validation strategies for data with temporal, spatial, hierarchical, or phylogenetic structure
10:30 - 11:00 Summarizing multiple aspects of model performance in a single diagram
11:00 - 11:15 Break
11:15 - 12:00 EXCURSION
Nora Linscheid
12:00 - 13:00 Lunch Break
13:00 - 13:30 BGI SEMINAR
13:30 - 14:00 BGI SEMINAR
14:00 - 14:15 Break
14:15 - 15:00 Mixed effect model
Thomas Wutzler
15:00 - 17:00 Practical Thomas Wutzler
Wednesday, November 18
9:00 - 09:45 Random Forests Martin Jung
09:45 - 10:00 Break
10:00 - 10:30 Bias in random forest variable importance measures: Illustrations, sources and a solution
10:30 - 11:00 Isolation Forest
11:00 - 11:15 Break
11:15 - 12:00 EXCURSION
Jacob Nelson
12:00 - 13:00 Lunch Break
13:00 - 13:30 A working guide to boosted regression trees
13:30 - 14:00 A unified approach to interpreting model predictions
14:00 - 14:15 Break
14:15 - 15:00 Model evaluation
Martin Jung
15:00 - 17:00 Practical Simon Bessnard
Thursday, November 19
9:00 - 09:45 Neural Networks Basil Kraft
09:45 - 10:00 Break
10:00 - 10:30 Deep learning
10:30 - 11:00 Long Short-Term Memory
11:00 - 11:15 Break
11:15 - 12:00 EXCURSION
Basil Kraft
12:00 - 13:00 Lunch Break
13:00 - 13:30 Variable Importance
Martin Jung
13:30 - 15:00 Practical
Fabian Ganz
Friday, November 20
9:00 - 09:45 Parameter estimation Nuno Carvalhais
09:45 - 10:00 Break
10:00 - 10:30 Decomposition of the mean squared error and NSE performance criteria: Implications for improving hydrological modelling
10:30 - 11:00 A comparison of techniques for the estimation of model prediction uncertainty
11:00 - 11:15 Break
11:15 - 12:00 EXCURSION
Tina Trautmann
12:00 - 13:00 Lunch Break
13:00 - 13:30 Deep learning and process understanding for data-driven Earth system science
13:30 - 14:00 Feedback

1.6  Interested?

Prerequisites:

  • Basic knowledge of a language of scientific computing: R, Matlab
  • Make use of the R course - The basics
  • Either the course 'Basic statistics' or recalling the typical “statistics 1” type of lectures from university.

Exercises will be in R – the use of any other language is welcome; however support depends on the person in charge and cannot be guaranteed.



Learn R… Here is a list of useful online resources to help you bring your R skills to a new level.
The material from the R basics course might also be useful for you.

1.7  Material

Here, you can download the papers, which you will need for your presentation.

1.8  Requirements for the assignment

All participants have to prepare a short presentation on one "unconventional" method of their choice: Every day will have a few of these presentations and we want to discuss with you about the pros and cons: Please register for one of the following topics (but feel free to add another one).

Important

  • Don’t choose a technique that you know already!
  • Check the list of participants below and choose a topic that has not yet been selected. Ideally, we would like to cover all topics.

....and note that we are not necessarily experts in the methods.

# / NAME OF PRESENTER Topic Context
1 SOPHIA WALTER Archetypal Analysis Multivariate data representation
2 / ANN-SOPHIE LEHNERT A working guide to boosted regression trees non parametric regression
3 / From outliers to prototypes: Ordering data novelty/outlier detection
4 / SANTIAGO BOTIA Long Short-Term Memory neural networks for time series
5 Calibration of process-oriented models model calibration and evaluation
6 / SOPHIE VON FROMM Deep learning deep learning overview
7 / CAGLAR KUCUK A unified approach to interpreting model predictions variable importance, explainable AI
8 Quantile regression forestsa random forest, quantile regression
9 / Deep learning and process understanding for data-driven Earth system science deep learning and hybrid modeling for Earth System Science
10 / SINIKKA PAULUS Cross-validation strategies for data with temporal, spatial, hierarchical, or phylogenetic structure model evaluation
11 MissForest—non-parametric missing value imputation for mixed-type data random forests, data imputation (filling missing data)
12 / WEIJIE ZHANG Bias in random forest variable importance measures: Illustrations, sources and a solution random forest, variable importance
13 Measuring and Testing Dependence by Correlation of Distances non-linear correlation
14 /ULISSE GOMARASCA Visualizing Data using t-SNE dimensionality reduction, multivariate data visualization
15 The energy of data non-parametric statistics based on distances
16 / QIAN ZHANG Summarizing multiple aspects of model performance in a single diagram model evaluation
17 / WANTONG LI Decomposition of the mean squared error and NSE performance criteria: Implications for improving hydrological modelling model evaluation and calibration
18 / HOONTAEK LEE Isolation Forest random forest, novelty/outlier detection
19 / YUNPENG LUO  Bootstrap methods for standard errors, confidence intervals, and other measures of statistical accuracy uncertainty
20 / SIYUAN WANG A comparison of techniques for the estimation of model prediction uncertainty uncertainty
21 / Verification, validation, and confirmation of numerical models in the earth sciences model evaluation and calibration
22 / ALBRECHT SCHALL Hierarchical Clustering via Joint Between-Within Distances: Extending Ward's Minimum Variance Method clustering
23 Locally Weighted Regression: An Approach to Regression Analysis by Local Fitting smoothing
24 / JASPER DENISSEN Toward the true near‐surface wind speed: Error modeling and calibration using triple collocation uncertainty

2.  Participants

COVID-19 update (August 14, 2020): after talking to the coronateam more then 16 persons are allowed to be inside the lecture hall without wearing a face mask. Due to the this number of participants is now limited to 19 (plus 1 lecturer). Please note that our infection protection plan is based on the one of the City of Jena, which will be updated at the end of August. Changes might occure.



This page was last modified on November 17, 2020, at 10:20 AM

Directions | Disclaimer | Data Protection | Contact | Internal | Webmail | Local weather | PRINT | © 2011-2020 Max Planck Institute for Biogeochemistry