OBJECTIVES
Participants will learn the basics in R, including a technical introduction into the R syntax. This course is suitable for participants with no knowledge of R or to refresh the basics in R.
Participants will learn the most important concepts and terms in statistics and data analysis and how to
carry out first exploratory and inferential statistical analyses in R.
WORKSHOP CONTENT
- Working with R and RStudio
- Installing and using extension packages in R
- Introduction to help pages and tips for self-help
- Explanation of the most important data types, operators (arithmetic & logical operators) and functions in R
- Importing and exporting data
- Working with data frames and vectors (numeric, logical, character, factors), e.g. indexing, splitting
and converting variables or data sets
- Calculating simple summary statistics in R (e.g. median, mean, quantiles, variance, etc.)
- Definition of Data Science and other basic terms
- Introduction to ggplot2 for data visualisation
- Univariate descriptive statistics and data visualisation in R: frequency tables, bar charts, histograms, kernel density estimation, box plots, densities and distributions, QQ plots, etc.
- Multivariate descriptive statistics and data visualisation in R: cross tables, scatter plots, correlation
- Introduction to statistical inference: Point estimation, interval estimation and confidence intervals.
- Motivation and overview of statistical hypothesis testing
- Interpretation of results and explanation of terms related to hypothesis tests: Significance level,
p-value, test statistic, etc.
- Tests covered: t-test, Welch test (test for differences in means), Mann-Whitney U test or Wilcoxon
rank sum test, Shapiro-Wilk test (test for normal distribution), Kolmogoroff-Smirnow test
- Multiple testing: problems and solutions (e.g. Bonferroni correction)
- Introduction to the linear regression model.
- Model evaluation and model diagnosis: MSE, R-squared, QQ-plots and residuals analysis.
- Outlook: Generalised linear models with a focus on logistic regression.
TARGET AUDIENCE
This course is suitable for participants with no knowledge of R or to refresh the basics in R. The workshop is very hands-on and thus limited to max. 15 participants.
TECHNICAL REQUIREMENTS
Use a laptop/PC with reliable internet access and install the following software:
ABOUT THE TRAINER
Fiona Katharina Ewald specializes in the field of Interpretable Machine Learning. She holds a Bachelor’s degree in Business Mathematics (B.Sc.) and a Master’s degree in Economics with a specialization in Statistics (M.Sc.), both of which she successfully completed at the University of Duisburg-Essen.