Regularized and Retrofitted models for Learning Sentence Representation with Context

Vector representation of sentences is important for many text processing tasks that involve classifying, clustering, or ranking sentences. For solving these tasks, bag-of-word based representation has been used for a long time. In recent years, distributed representation of sentences learned by neural models from unlabeled data has been shown to outperform traditional bag-of-words representations. However, most existing methods belonging to the neural models consider only the content of a sentence, and disregard its relations with other sentences in the context. In this paper, we first characterize two types of contexts depending on their scope and utility. We then propose two approaches to incorporate contextual information into content-based models. We evaluate our sentence representation models in a setup, where context is available to infer sentence vectors. Experimental results demonstrate that our proposed models outshine existing models on three fundamental tasks, such as, classifying, clustering, and ranking sentences.

Keywords

Sen2Vec, distributed representation of sentences, feature learning

Cite As

Saha, T. K., Joty, S., Hassan, N., & Hasan, M. A. (2017). Regularized and Retrofitted Models for Learning Sentence Representation with Context. In Proceedings of the 2017 ACM on Conference on Information and Knowledge Management (pp. 547–556). New York, NY, USA: ACM. https://doi.org/10.1145/3132847.3133011

Journal

Proceedings of the 2017 ACM on Conference on Information and Knowledge Management

Rights

Publisher Policy

Source

Author

Type

Article

Permanent Link

https://hdl.handle.net/1805/17156

DOI

https://doi.org/10.1145/3132847.3133011

Version

Author's manuscript

Collections

Open Access Policy Articles
Department of Computer and Information Science Works
Department of Computer Science Works

Full item page