Con-S2V: A Generic Framework for Incorporating Extra-Sentential Context into Sen2Vec

We present a novel approach to learn distributed representation of sentences from unlabeled data by modeling both content and context of a sentence. The content model learns sentence representation by predicting its words. On the other hand, the context model comprises a neighbor prediction component and a regularizer to model distributional and proximity hypotheses, respectively. We propose an online algorithm to train the model components jointly. We evaluate the models in a setup, where contextual information is available. The experimental results on tasks involving classification, clustering, and ranking of sentences show that our model outperforms the best existing models by a wide margin across multiple datasets.

Keywords

Sen2Vec, extra-sentential context, embedding of sentences

Cite As

Saha, T. K., Joty, S., & Hasan, M. A. (2017). Con-S2V: A Generic Framework for Incorporating Extra-Sentential Context into Sen2Vec. In Machine Learning and Knowledge Discovery in Databases (pp. 753–769). Springer, Cham. https://doi.org/10.1007/978-3-319-71249-9_45

Journal

Machine Learning and Knowledge Discovery in Databases

Rights

Publisher Policy

Source

Author

Type

Article

Permanent Link

https://hdl.handle.net/1805/17255

DOI

https://doi.org/10.1007/978-3-319-71249-9_45

Version

Author's manuscript

Collections

Open Access Policy Articles
Department of Computer and Information Science Works
Department of Computer Science Works

Full item page