A Framework for Text Processing and Supporting Access to Collections of Digitized Historical Newspapers

dc.contributor.authorAllen, Robert B
dc.contributor.authorCopeland, Andrea J.
dc.contributor.authorAchananuparp, Palakorn
dc.contributor.authorLee, Ki Jung
dc.date.accessioned2014-06-19T17:14:10Z
dc.date.available2014-06-19T17:14:10Z
dc.date.issued2007
dc.description.abstractLarge quantities of historical newspapers are being digitized and OCRd. We describe a framework for processing the OCRd text to identify articles and extract metadata for them. We describe the article schema and provide examples of features that facilitate automatic indexing of them. For this processing, we employ lexical semantics, structural models, and community content. Furthermore, we describe visualization and summarization techniques that can be used to present the extracted events.en_US
dc.identifier.citationAllen, R. B., Japzon, A., Achananuparp, P., & Lee, K. J. (2007). A framework for text processing and supporting access to collections of digitized historical newspapers. In Human Interface and the Management of Information. Interacting in Information Environments (pp. 235-244). Springer Berlin Heidelberg.en_US
dc.identifier.urihttps://hdl.handle.net/1805/4552
dc.language.isoen_USen_US
dc.subjecttext processingen_US
dc.subjecthistorical newspapersen_US
dc.subjectdigitizationen_US
dc.titleA Framework for Text Processing and Supporting Access to Collections of Digitized Historical Newspapersen_US
dc.typeBook chapteren_US
Files
Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
allen-2007-framework.pdf
Size:
471.7 KB
Format:
Adobe Portable Document Format
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.88 KB
Format:
Item-specific license agreed upon to submission
Description: