Metodo

International Studies in Phenomenology and Philosophy

Series | Book | Chapter

226402

Russianflu-de

a German corpus for a historical epidemic with temporal annotation

Tran Van CanhKatja MarkertWolfgang Nejdl

pp. 61-73

Abstract

Temporally annotated corpora about historic events can be crucial to digital humanities research: they allow to extract and date events as well as reactions to them, and to construct timelines of events and of language use, among other applications. However, producing a precise corpus of a particular event in history is very challenging due to the lack of noise-free digitalized data. This paper introduces RussianFlu-DE, a temporally annotated corpus of 639 articles extracted from noisy OCR text of newspaper issues in German. All articles are about the Russian flu epidemic that took place during 1889–1893. We describe the development of RussianFlu-DE, including methods to clean different types of noise in the OCR text, and our tool for extracting Russian flu related articles. In addition, the task of temporal annotation using the TIMEX2 schema is discussed and the characteristics of the corpus compared to other corpora are presented. To show how our contribution supports epidemiology, we present some preliminary yet interesting results obtained from analyzing the articles in RussianFlu-DE. The corpus and associated tools for exploration are publicly available.

Publication details

Published in:

Kamps Jaap, Tsakonas Giannis, Manolopoulos Yannis, Iliadis Lazaros, Karydis Ioannis (2017) Research and advanced technology for digital libraries: 21st international conference on theory and practice of digital libraries, TPDL 2017, Thessaloniki, Greece, September 18-21, 2017. Dordrecht, Springer.

Pages: 61-73

DOI: 10.1007/978-3-319-67008-9_6

Full citation:

Van Canh Tran, Markert Katja, Nejdl Wolfgang (2017) „Russianflu-de: a German corpus for a historical epidemic with temporal annotation“, In: J. Kamps, G. Tsakonas, Y. Manolopoulos, L. Iliadis & I. Karydis (eds.), Research and advanced technology for digital libraries, Dordrecht, Springer, 61–73.