Open Set Authorship Attribution Toward Demystifying Victorian Periodicals

Existing research in computational authorship attribution (AA) has primarily focused on attribution tasks with a limited number of authors in a closed-set configuration. This restricted set-up is far from being realistic in dealing with highly entangled real-world AA tasks that involve a large number of candidate authors for attribution during test time. In this paper, we study AA in historical texts using a new data set compiled from the Victorian literature. We investigate the predictive capacity of most common English words in distinguishing writings of most prominent Victorian novelists. We challenged the closed-set classification assumption and discussed the limitations of standard machine learning techniques in dealing with the open set AA task. Our experiments suggest that a linear classifier can achieve near perfect attribution accuracy under closed set assumption yet, the need for more robust approaches becomes evident once a large candidate pool has to be considered in the open-set classification setting.

Keywords

Author attribution, Open-set classification, Victorian literature

Cite As

Badirli, S., Borgo Ton, M., Gungor, A., & Dundar, M. (2021). Open Set Authorship Attribution Toward Demystifying Victorian Periodicals. In J. Lladós, D. Lopresti, & S. Uchida (Eds.), Document Analysis and Recognition – ICDAR 2021 (Vol. 12824, pp. 221–235). Springer International Publishing. https://doi.org/10.1007/978-3-030-86337-1_15

ISSN

978-3-030-86336-4 978-3-030-86337-1

Journal

Document Analysis and Recognition – ICDAR 2021

Rights

Publisher Policy

Source

ArXiv

Type

Article

Permanent Link

https://hdl.handle.net/1805/32619

DOI

https://doi.org/10.1007/978-3-030-86337-1_15

Version

Author's manuscript

Collections

Open Access Policy Articles
Department of Computer and Information Science Works

Full item page