Concept embedding-based weighting scheme for biomedical text clustering and visualization

dc.contributor.authorLuo, Xiao
dc.contributor.authorShah, Setu
dc.contributor.departmentComputer Information and Graphics Technology, School of Engineering and Technologyen_US
dc.date.accessioned2019-01-02T15:55:11Z
dc.date.available2019-01-02T15:55:11Z
dc.date.issued2018-11-01
dc.description.abstractBiomedical text clustering is a text mining technique used to provide better document search, browsing, and retrieval in biomedical and clinical text collections. In this research, the document representation based on the concept embedding along with the proposed weighting scheme is explored. The concept embedding is learned through the neural networks to capture the associations between the concepts. The proposed weighting scheme makes use of the concept associations to build document vectors for clustering. We evaluate two types of concept embedding and new weighting scheme for text clustering and visualization on two different biomedical text collections. The returned results demonstrate that the concept embedding along with the new weighting scheme performs better than the baseline tf–idf for clustering and visualization. Based on the internal clustering evaluation metric-Davies–Bouldin index and the visualization, the concept embedding generated from aggregated word embedding can form well-separated clusters, whereas the intact concept embedding can better identify more clusters of specific diseases and gain better F-measure.en_US
dc.eprint.versionFinal published versionen_US
dc.identifier.citationLuo, X., & Shah, S. (2018). Concept embedding-based weighting scheme for biomedical text clustering and visualization. Applied Informatics, 5(1), 8. https://doi.org/10.1186/s40535-018-0055-8en_US
dc.identifier.issn2196-0089en_US
dc.identifier.urihttps://hdl.handle.net/1805/18062
dc.language.isoen_USen_US
dc.publisherBioMed Centralen_US
dc.relation.isversionof10.1186/s40535-018-0055-8en_US
dc.relation.journalApplied Informaticsen_US
dc.rightsAttribution 3.0 United States
dc.rights.urihttp://creativecommons.org/licenses/by/3.0/us/
dc.sourcePublisheren_US
dc.subjectBiomedical text clusteringen_US
dc.subjectVisualizationen_US
dc.subjectConcept embeddingen_US
dc.subjectNeural networksen_US
dc.titleConcept embedding-based weighting scheme for biomedical text clustering and visualizationen_US
dc.typeArticleen_US
Files
Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
document.pdf
Size:
2.91 MB
Format:
Adobe Portable Document Format
Description:
Article
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.99 KB
Format:
Item-specific license agreed upon to submission
Description: