A Framework for Text Processing and Supporting Access to Collections of Digitized Historical Newspapers

Allen, Robert B; Copeland, Andrea J.; Achananuparp, Palakorn; Lee, Ki Jung

A Framework for Text Processing and Supporting Access to Collections of Digitized Historical Newspapers

Files

allen-2007-framework.pdf (471.7 KB)

Date

2007

Authors

Allen, Robert B

Copeland, Andrea J.

Achananuparp, Palakorn

Lee, Ki Jung

Language

American English

Abstract

Large quantities of historical newspapers are being digitized and OCRd. We describe a framework for processing the OCRd text to identify articles and extract metadata for them. We describe the article schema and provide examples of features that facilitate automatic indexing of them. For this processing, we employ lexical semantics, structural models, and community content. Furthermore, we describe visualization and summarization techniques that can be used to present the extracted events.

Keywords

text processing, historical newspapers, digitization

Cite As

Allen, R. B., Japzon, A., Achananuparp, P., & Lee, K. J. (2007). A framework for text processing and supporting access to collections of digitized historical newspapers. In Human Interface and the Management of Information. Interacting in Information Environments (pp. 235-244). Springer Berlin Heidelberg.

Rights

Type

Book chapter

Permanent Link

https://hdl.handle.net/1805/4552

Collections

Department of Library and Information Science Works

Full item page