BACKGROUND
Machine learning (ML) is increasingly used and holds large potential in scientific fields traditionally driven by physical models and process-based explanations, such as Earth and environmental sciences. ML enables researchers to analyze large datasets, uncover patterns, and develop models that complement traditional approaches in understanding complex environmental systems and processes.
However, ML models often lack transparency and scientific interpretability, making it difficult to:
- Verify whether predictions align with real-world physics and scientific principles.
- Identify errors, biases, or unintended shortcuts in model reasoning.
- Communicate findings in a way that is scientifically meaningful and justifiable.
To address these challenges, multiple approaches exist to improve ML trustworthiness in scientific research including explainable AI (XAI) techniques and physics-informed ML modeling. But these techniques also have limitations, and it is essential to understand what ML models can and cannot do when applying them to your field of research.
WORKSHOP GOAL
By the end of the workshop, participants will have an overview of the role of ML in environmental research and how to assess and interpret models. They will be able to analyze model predictions and identify key factors influencing the results. Additionally, they will know ways to integrate scientific knowledge into ML models to improve model reliability in research fields traditionally guided by physical principles.
While Python will be used for demonstrations, participants will focus on understanding concepts rather than programming.
WORKSHOP CONTENT
- An overview of how ML works, focusing on core concepts rather than coding details.
- The role (potential and limitations) of different approaches to improving the transparency and scientific reliability of ML models in Earth and environmental sciences.
- Introduction to key techniques such as SHAP values and partial dependence plots for explainability, alongside physics-informed methods for integrating scientific constraints into ML models.
- Practical exploration in Python, using prepared scripts to analyze and discuss ML model outputs.
TARGET AUDIENCE & PRIOR KNOWLEDGE
This training is for researchers in physics-driven sciences who want to explore explainability techniques in machine learning. Participants
should be comfortable reading advanced Python code, but prior experience with machine learning is not required. The
Quickstart Python for Quantitative Data training can provide a basic preparation.
TECHNICAL REQUIREMENTS
ABOUT THE TRAINER
Annika Nolte is a data scientist for training and consulting at the DSC. She holds a master’s degree in Environmental Sciences from the Technical University of Braunschweig (2019). She has over five years of experience in scientific programming, specializing in data management and processing, hydroinformatics, geospatial analysis, and AI for environmental modeling. In training and consulting, Annika draws on her research background and broad interdisciplinary expertise in Earth system sciences.