Leben gelabelt Computergestützte Inhaltverzeichnisse für Oral-History-Interviews

Main Article Content

Philipp Bayerschmidt, Dennis Möbus

Abstract

Machine-based methods for content analysis are not only used to obtain statistical results. The quantitative results can also be used qualitatively to identify patterns and thematic trends in the texts under investigation. This essay documents how a corpus of life history interviews, compiled from collections on the Oral History portal, was used to create a thematic index for all interviews represented in the portal using topic modelling, a machine learning method. This thematic index serves as an index for cross-archive searches and as a basis for thematic analysis. After providing an overview of the most common methods for automatic content indexing of texts, the entire process, from the compilation of the corpus to the finished topic index and sample tables of contents, is presented in a transparent manner. The default setting of all parameters, from the size of the sections to the number of topics, has been a major challenge so far. Automated processes were unable to deliver clear results, which is why a qualitative approach is proposed, in which the approximation of the optimal number of topics is evaluated using a zoom-in and zoom-out procedure (scalable reading). A group of experienced oral historians then labelled the individual topics and grouped similar topics into clusters. Finally, the labelled topics were implemented as a register in the oh.d platform. On this basis, tables of contents can be created for all interviews.


Bibliography: Bayerschmidt, Philipp/Möbus, Dennis: Leben gelabelt. Computergestützte Inhaltverzeichnisse für Oral-History-Interviews, BIOS – Zeitschrift für Biographieforschung, Oral History und Lebensverlaufsanalysen, 1+2-2025, pp. 83-104.

Article Details

Published: March 2026
Open Access from: 2028-03-02
Open Access License: CC BY 4.0

Literature

Horstmann, Jan (2018): Topic Modeling. In: forTEXT. Literatur digital erforschen. Online: https://fortext.net/routinen/methoden/topic-modeling (14.9.2025).

Hoyle, Alexander, Pranav Goel, Denis Peskov, Andrew Hian-Cheong, Jordan Boyd-Graber und Philip Resnik (2021): Is Automated Topic Model Evaluation Broken? The Incoherence of Coherence. In: 35th Conference on Neural Information Processing Systems (NeurIPS) 2021, Online als PDF: https://proceedings.neurips.cc/paper/2021/file/0f83556a305d789b1d71815e8ea4f4b0-Pa-per.pdf.

Krautter, Benjamin (2024): The Scales of (Computational) LiteraryStudies: Martin Mueller’s Conceptof Scalable Reading in Theory and Practice. In: Florentina Armaselu und Andreas Fickers (Hg.): Zoomland: Exploring Scale in Digital History and Humanities, Berlin, Boston: De Gruyter Oldenbourg, 261-286. https://doi.org/10.1515/9783111317779-011

Loos, Peter und Burkhard Schäffer (2001): Das Gruppendiskussionsverfahren. Theoretische Grundlagen und empirische Anwendung, Qualitative Sozialforschung, Bd. 5, Opladen: Leske und Budrich.

Möbus, Dennis (2025): Interview Chronology Analysis (ICA). Verläufe von (lebensgeschichtlichen) Interviews visuell analysieren, In: Nils Reiter, Thomas Haider, Daniel Kababgi und Hendrik Buschmeier (Hg.) Under Construction. Book of Abstracts - DHd 2025, 11. DHd-Tagung 3.-7. März 2025 Bielefeld, 61-64. https://doi.org/10.5281/zenodo.14887460

Mosqueira-Rey, Eduardo, Elena Hernández-Pereira, David Alonso-Ríos, José Bobes-Bascarán, und Ángel Fernández-Leal (2023): Human-in-the-loop machine learning: a state of the art. In: Artificial Intelligence Review, 56, 3005-3054. https://doi.org/10.1007/s10462-022-10246-w

Pagenstecher, Cord (2024): Oral-History.Digital: Eine Erschließungs- und Rechercheplattform für audio-visuelle narrative Forschungsdaten. In: O-Bib. Das Offene Bibliotheksjournal, 11, Heft 1, 1-8. https://doi.org/10.5282/o-bib/6007

Rawson, Katie und Trevor Muñoz (2019): Against Cleaning. In: Matthew K. Gold und Lauren F. Klein (Hg.): Debates in the Digital Humanities 2019. Debates in the Digital Humanities, Minneapolis: University of Minnesota Press.

Schöch, Christof, Frédéric Döhl, Achim Rettinger, Evelyn Gius, Peer Trilcke, Peter Leinen, Fotis Jannidis, Maria Hinzmann und Jörg Röpke (2020): Abgeleitete Textformate: Text und Data Mining mit urheberrechtlich geschützten Textbeständen. In: Zeitschrift für digitale Geisteswissenschaften, 5, Heft 5.

Read More