BERD Workshop: An Introduction to Text Mining for Scientific Inquiry

The language that humans use to interact, communicate, and record their thoughts and experiences carries a great deal of information of interest to the health and behavioral sciences. For example, individuals may talk about their symptoms, discuss their emotions, record notes from clinical meetings, or post on social media about their experiences. Unfortunately, this important data often goes unused in scientific analysis because its unstructured nature can make it difficult to model with conventional tools.

Text Mining tools can extract useful information and patterns from unstructured text data in a way that can be easily understood and statistically modeled. This workshop, presented by Timothy Brick, Ph.D., is intended to provide an introduction to common tools for text mining ranging from simple word counts to the transformer models that underlie ChatGPT, and highlight the ways they can be used to answer questions in the behavioral and health sciences. Topics will include the following:

Finding common topics from a person’s daily reflections
Extracting sentiment and emotions from social media comments
Automatically generating summaries of meeting notes or scientific articles
Creating and modeling text in new ways with deep word embeddings.

Code examples using R/RMarkdown format will be provided.

Download

IntroTextMining-BERD-Su2024-Part1.pdf (806.68 KB)

IntroTextMining-BERD-Su2024-Part2.pdf (995.33 KB)

IntroTextMining-BERD-Su2024-Part3.pdf (1.62 MB)

Contributors

Timothy R. Brick