Applied Text Mining Using Python (DSC-2024-12)


04.12. - 06.12.2024


04.12. & 05.12.
9:00 AM - 12:30 PM &
2:00 PM - 5:00 PM

06.12.
9:00 AM - 12:30 PM


Trainings


Speaker:
Dr. Maryam Movahedifar
Data Science Center, University of Bremen

Location:
UNICOM 2 (Entrance Haus Oxford)
Mary-Sommerville-Str. 2
Raum 2.1060 (First floor)

Number of Participants: Max. 20

Language: English






« Back

BACKGROUND

Given the rapid rate at which text data are being digitally gathered in many domains of science, there is growing need for automated tools that can analyses, classify, and interpret this kind of data. Text mining techniques can be applied to create a structured representation of text, making its content more accessible for researchers. Applications of text mining are everywhere: social media, web search, advertising, emails, customer service, healthcare, marketing, etc.

WORKSHOP GOAL

This course offers an extensive exploration into text mining with Python. The course has a strongly practical hands-on focus, and students will gain experience in using text mining on real data from for example social sciences and healthcare and interpreting the results. Through lectures and practicals, the students will learn the necessary skills to design, implement, and understand their own text mining pipeline.

The course deals with the following topics:
  • Review the fundamental approaches to text mining
  • Understand and apply current methods for analysing texts
  • Define a text mining pipeline given a practical data science problem
  • Implement all steps in a text mining pipeline: feature extraction, feature selection, model learning, model evaluation
  • Understand and apply state-of-the-art methods in text mining
The course starts with reviewing basic concepts of text mining and implementing advanced concepts in natural language processing.

WORKSHOP CONTENT

PART 1
  • Basics of Python (Basic data types, Containers, Functions, Numpy, ...)
  • Practical exercise
PART 2
  • What is Text Mining?
  • Text Preprocessing
  • Vector Space Model
  • Practical exercise
PART 3
  • Classification basics
  • Text Classification Algorithms
  • Evaluating classifiers
  • Practical exercise
PART 4
  • How to do feature selection (FS) for text data?
  • Text Preprocessing
  • Is PCA a FS method for text?
  • Other methods?
  • Practical exercise
PART 5
  • What is text clustering?
  • What are the applications?
  • How to cluster text data?
  • Practical exercise

TARGET AUDIENCE & PRIOR KNOWLEDGE

This course is ideal for learners who are comfortable with Python programming, wish to acquire skills in text mining approaches, and have a foundational understanding of machine learning. Participants from various fields such as sociology, psychology, education, human development, marketing, business, biology, medicine, political science, and communication sciences will find this course beneficial.

TECHNICAL REQUIREMENTS

Participants are requested to bring their own laptop for the lab meetings and make sure that you have an Internet connection to be able to use Python in Google Colab.
Participants should have a basic knowledge of data science and programming and a motivation of scripting and programming in Python.


ABOUT THE TRAINER

Maryam Movahedifar holds a PhD in Statistics and has extensive experience in Interpretable Machine Learning. With a strong foundation in statistical methods and practical experience in applying these techniques to real-world problems, she is well-equipped to teach complex machine learning concepts. Her expertise includes making advanced models understandable and accessible.




The Data Science Center is funded by:
Logo funding by BMBF Logo funding by EU