OBJECTIVES
Researchers have achieved some breakthrough developments in text mining and natural language
processing (NLP) which are mainly driven by three key factors: new deep learning frameworks, better
computational resources, and access to larger amounts of data (Big Data). In this workshop, we will start
with the basics of text processing in Python and learn about classical feature engineering from machine
learning for text data. We will then look at word embeddings, word vectors, and their integration into Deep
Learning architectures like RNNs. We will also delve into the “attention” mechanism and transfer learning,
key components of state-of-the-art models like BERT & Co.
WORKSHOP CONTENT
- We demonstrate the importance of NLP with some examples. Followed by an introduction to
dealing with text data and their potential representations in ML. We also shortly introduce Fully-Connected-Neural-Networks (FCNNs) as an important basis for the rest of the course.
- We focus on neural representations of texts and start with the idea of language modeling using the neural probabilistic language model (Bengio et al, 2003). Then, the Word2Vec framework (Mikolov et al., 2013), the Doc2Vec framework (Mikolov and Le, 2014), and the FastText framework (Bojanowski et al, 2017) are introduced. The frameworks will be accompanied with hands-on sessions for practical implementation of what has been learned.
- We will focus on Deep Learning and current state-of-the-art architectures. We will take an
in-depth look at existing transfer learning resources and apply what we have learned in a final
hands-on session.
- For the hands-on parts of the workshop, practice exercises will be provided in the form of Jupyter notebooks that participants can use to complete the exercises themselves.
TARGET AUDIENCE
PhD students and postdocs with basic knowledge of Python and supervised machine learning methods. The workshop is very hands-on and thus limited to max. 15 participants.
TECHNICAL REQUIREMENTS
Use a laptop/PC with reliable internet access and install the following software:
ABOUT THE TRAINER
Dr. Matthias Assenmacher works as trainer for Data Science Essential GmbH and is postdoctoral researcher at the Chair of Statistical Learning and Data Science (LMU) and the
NFDI Consortium for Business, Economic and Related Data (BERD@NFDI). He obtained his bachelor’s degree in Economics from LMU in 2014, afterwards I turned to Statistics (with a focus on
social and economic studies) and obtained his Master’s degree in
2017 (also from LMU). In October 2021 he finished his PhD with a focus on Natural Language Processing.
His expertise revolves around the practical application of state-of-the-art NLP architectures to real-world problems from various disciplines, as well as open and reproducible science.