Concept embedding-based weighting scheme for biomedical text clustering and visualization

If you need an accessible version of this item, please submit a remediation request.
Date
2018-11-01
Language
American English
Embargo Lift Date
Committee Members
Degree
Degree Year
Department
Grantor
Journal Title
Journal ISSN
Volume Title
Found At
BioMed Central
Abstract

Biomedical text clustering is a text mining technique used to provide better document search, browsing, and retrieval in biomedical and clinical text collections. In this research, the document representation based on the concept embedding along with the proposed weighting scheme is explored. The concept embedding is learned through the neural networks to capture the associations between the concepts. The proposed weighting scheme makes use of the concept associations to build document vectors for clustering. We evaluate two types of concept embedding and new weighting scheme for text clustering and visualization on two different biomedical text collections. The returned results demonstrate that the concept embedding along with the new weighting scheme performs better than the baseline tf–idf for clustering and visualization. Based on the internal clustering evaluation metric-Davies–Bouldin index and the visualization, the concept embedding generated from aggregated word embedding can form well-separated clusters, whereas the intact concept embedding can better identify more clusters of specific diseases and gain better F-measure.

Description
item.page.description.tableofcontents
item.page.relation.haspart
Cite As
Luo, X., & Shah, S. (2018). Concept embedding-based weighting scheme for biomedical text clustering and visualization. Applied Informatics, 5(1), 8. https://doi.org/10.1186/s40535-018-0055-8
ISSN
2196-0089
Publisher
Series/Report
Sponsorship
Major
Extent
Identifier
Relation
Journal
Applied Informatics
Source
Publisher
Alternative Title
Type
Article
Number
Volume
Conference Dates
Conference Host
Conference Location
Conference Name
Conference Panel
Conference Secretariat Location
Version
Final published version
Full Text Available at
This item is under embargo {{howLong}}