A Framework for Text Processing and Supporting Access to Collections of Digitized Historical Newspapers

Date
2007
Language
American English
Embargo Lift Date
Department
Committee Members
Degree
Degree Year
Department
Grantor
Journal Title
Journal ISSN
Volume Title
Found At
Abstract

Large quantities of historical newspapers are being digitized and OCRd. We describe a framework for processing the OCRd text to identify articles and extract metadata for them. We describe the article schema and provide examples of features that facilitate automatic indexing of them. For this processing, we employ lexical semantics, structural models, and community content. Furthermore, we describe visualization and summarization techniques that can be used to present the extracted events.

Description
item.page.description.tableofcontents
item.page.relation.haspart
Cite As
Allen, R. B., Japzon, A., Achananuparp, P., & Lee, K. J. (2007). A framework for text processing and supporting access to collections of digitized historical newspapers. In Human Interface and the Management of Information. Interacting in Information Environments (pp. 235-244). Springer Berlin Heidelberg.
ISSN
Publisher
Series/Report
Sponsorship
Major
Extent
Identifier
Relation
Journal
Rights
Source
Alternative Title
Type
Book chapter
Number
Volume
Conference Dates
Conference Host
Conference Location
Conference Name
Conference Panel
Conference Secretariat Location
Version
Full Text Available at
This item is under embargo {{howLong}}