Category: Skill course
0.2 CP per course day
Date: November 16 - 20, 2020
Place: lecture room @ MPI-BGC (depending on COVID-19 regulations)
The course will cover selected topics of advanced statistics and machine learning. Lectures on some topics will be accompanied with presentations by participants, “Excursion” talks on applications in research, and basic practicals in the afternoon. The course requires basic knowledge of statistics. The practical session require basic knowledge with a programming language – examples will be provided in R.
Participants will give a presentation (20min + 10min Q&A) on a paper or topic of their choice. Below you can find a list of suggested papers. If you want to work on a topic in a team of 2 (i.e. 40min+20min Q&A) or suggest an alternative topic please inquire this until 31st October with the proposed topic to email@example.com.
During registration please choose a topic that was not yet chosen.
All presentations need to be ready on Monday 16th Nov 2020 at 9 am. The detailed schedule will be announced then.
The presentations should be educational and try to focus on the important things one should know about a method when applying it, i.e. the principle, advantages, disadvantages, assumptions, and pitfalls, rather than all mathematic details, derivations, theorems and proofs. Practical examples are often very illustrative.
Bring a laptop with a recent version of R being installed or running for the practicals. If you prefer another language, that is fine but we will not provide corresponding code examples. Please also make sure that you can access the internet via WLAN (BGC-users, if you have a BGC-account; BGC-guests, if you don't have an account).
|Monday, November 16|
Introduction to basic statistical tools
|Tuesday, November 17|
Time series analysis
|Wednesday, November 18|
|Thursday, November 19|
Structural euqation modelling
|Friday, November 20|
Here, you can download the papers.
Exercises will be in R – the use of any other language is welcome; however support depends on the person in charge and cannot be guaranteed.
The course can be a 'stand-alone' (separate certificate) if you have a solid background in basic statistics. You can brush up your skills with the course 'Basic statistics'. Register here.
All participants have to prepare a short presentation on one "unconventional" method of their choice: Every day will have a few of these presentations and we want to discuss with you about the pros and cons: Please register for one of the following topics (but feel free to add another one).
....and note that we are not necessarily experts in the methods.
|# / NAME OF PRESENTER||Topic||Context|
|1||Archetypal Analysis||Multivariate data representation|
|2 / ANN-SOPHIE LEHNERT||A working guide to boosted regression trees||non parametric regression|
|3 / GÖKBEN DEMIR||From outliers to prototypes: Ordering data||novelty/outlier detection|
|4 / SANTIAGO BOTIA||Long Short-Term Memory||neural networks for time series|
|5||Calibration of process-oriented models||model calibration and evaluation|
|6 / SOPHIE VON FROMM||Deep learning||deep learning overview|
|7 / CAGLAR KUCUK||A unified approach to interpreting model predictions||variable importance, explainable AI|
|8||Quantile regression forestsa||random forest, quantile regression|
|9 / NA LI||Deep learning and process understanding for data-driven Earth system science||deep learning and hybrid modeling for Earth System Science|
|10 / SINIKKA PAULUS||Cross-validation strategies for data with temporal, spatial, hierarchical, or phylogenetic structure||model evaluation|
|11||MissForest—non-parametric missing value imputation for mixed-type data||random forests, data imputation (filling missing data)|
|12 / WEIJIE ZHANG||Bias in random forest variable importance measures: Illustrations, sources and a solution||random forest, variable importance|
|13||Measuring and Testing Dependence by Correlation of Distances||non-linear correlation|
|14||Visualizing Data using t-SNE||dimensionality reduction, multivariate data visualization|
|15||The energy of data||non-parametric statistics based on distances|
|16||Summarizing multiple aspects of model performance in a single diagram||model evaluation|
|17 / WANTONG LI||Decomposition of the mean squared error and NSE performance criteria: Implications for improving hydrological modelling||model evaluation and calibration|
|18||Isolation Forest||random forest, novelty/outlier detection|
|19||Bootstrap methods for standard errors, confidence intervals, and other measures of statistical accuracy||uncertainty|
|20||A comparison of techniques for the estimation of model prediction uncertainty||uncertainty|
|21 / EKATERINA BOGDANOVICH||Verification, validation, and confirmation of numerical models in the earth sciences||model evaluation and calibration|
|22||Hierarchical Clustering via Joint Between-Within Distances: Extending Ward's Minimum Variance Method||clustering|
|23||Locally Weighted Regression: An Approach to Regression Analysis by Local Fitting||smoothing|
|24 / JASPER DENISSEN||Toward the true near‐surface wind speed: Error modeling and calibration using triple collocation||uncertainty|
Please register here by October 9, 2020.
COVID-19 update (August 14, 2020): after talking to the coronateam more then 16 persons are allowed to be inside the lecture hall without wearing a face mask. Due to the this number of participants is now limited to 19 (plus 1 lecturer). Please note that our infection protection plan is based on the one of the City of Jena, which will be updated at the end of August. Changes might occure.
This page was last modified on September 27, 2020, at 11:22 AM