TY - JOUR
T1 - Building the Bridge
T2 - Topic Modeling for Comparative Research
AU - Lind, Fabienne
AU - Eberl, Jakob-Moritz
AU - Eisele, Olga
AU - Heidenreich, Tobias
AU - Galyga, Sebastian
AU - Boomgaarden, Hajo
N1 - Publisher Copyright:
© 2021 The Author(s). Published with license by Taylor & Francis Group, LLC.
PY - 2022
Y1 - 2022
N2 - In communication research, topic modeling is primarily used for discovering systematic patterns in monolingual text corpora. To advance the usage, we provide an overview of recently presented strategies to extract topics from multilingual text collections for the purpose of comparative research. Moreover, we discuss, demonstrate, and facilitate the usability of the "Polylingual Topic Model" (PLTM) for such analyses. The appeal of this model is that it derives lists of related clustered words in different languages with little reliance on translation or multilingual dictionaries and without the need for manual post-hoc matching of topics. PLTM bridges the gap between languages by making use of document connections in training documents. As these training documents are the crucial resource for the model, we compare model evaluation metrics for different strategies to build training documents. By discussing the advantages and limitations of the different strategies in respect to different scenarios, our study contributes to the methodological discussion on automated content analysis of multilingual text corpora.
AB - In communication research, topic modeling is primarily used for discovering systematic patterns in monolingual text corpora. To advance the usage, we provide an overview of recently presented strategies to extract topics from multilingual text collections for the purpose of comparative research. Moreover, we discuss, demonstrate, and facilitate the usability of the "Polylingual Topic Model" (PLTM) for such analyses. The appeal of this model is that it derives lists of related clustered words in different languages with little reliance on translation or multilingual dictionaries and without the need for manual post-hoc matching of topics. PLTM bridges the gap between languages by making use of document connections in training documents. As these training documents are the crucial resource for the model, we compare model evaluation metrics for different strategies to build training documents. By discussing the advantages and limitations of the different strategies in respect to different scenarios, our study contributes to the methodological discussion on automated content analysis of multilingual text corpora.
UR - http://www.scopus.com/inward/record.url?scp=85114457394&partnerID=8YFLogxK
U2 - 10.1080/19312458.2021.1965973
DO - 10.1080/19312458.2021.1965973
M3 - Article
SN - 1931-2458
VL - 16
SP - 96
EP - 114
JO - Communication Methods & Measures
JF - Communication Methods & Measures
IS - 2
ER -