Concept embedding-based weighting scheme for biomedical text clustering and visualization
dc.contributor.author | Luo, Xiao | |
dc.contributor.author | Shah, Setu | |
dc.contributor.department | Computer Information and Graphics Technology, School of Engineering and Technology | en_US |
dc.date.accessioned | 2019-01-02T15:55:11Z | |
dc.date.available | 2019-01-02T15:55:11Z | |
dc.date.issued | 2018-11-01 | |
dc.description.abstract | Biomedical text clustering is a text mining technique used to provide better document search, browsing, and retrieval in biomedical and clinical text collections. In this research, the document representation based on the concept embedding along with the proposed weighting scheme is explored. The concept embedding is learned through the neural networks to capture the associations between the concepts. The proposed weighting scheme makes use of the concept associations to build document vectors for clustering. We evaluate two types of concept embedding and new weighting scheme for text clustering and visualization on two different biomedical text collections. The returned results demonstrate that the concept embedding along with the new weighting scheme performs better than the baseline tf–idf for clustering and visualization. Based on the internal clustering evaluation metric-Davies–Bouldin index and the visualization, the concept embedding generated from aggregated word embedding can form well-separated clusters, whereas the intact concept embedding can better identify more clusters of specific diseases and gain better F-measure. | en_US |
dc.eprint.version | Final published version | en_US |
dc.identifier.citation | Luo, X., & Shah, S. (2018). Concept embedding-based weighting scheme for biomedical text clustering and visualization. Applied Informatics, 5(1), 8. https://doi.org/10.1186/s40535-018-0055-8 | en_US |
dc.identifier.issn | 2196-0089 | en_US |
dc.identifier.uri | https://hdl.handle.net/1805/18062 | |
dc.language.iso | en_US | en_US |
dc.publisher | BioMed Central | en_US |
dc.relation.isversionof | 10.1186/s40535-018-0055-8 | en_US |
dc.relation.journal | Applied Informatics | en_US |
dc.rights | Attribution 3.0 United States | |
dc.rights.uri | https://creativecommons.org/licenses/by/3.0/us | |
dc.source | Publisher | en_US |
dc.subject | Biomedical text clustering | en_US |
dc.subject | Visualization | en_US |
dc.subject | Concept embedding | en_US |
dc.subject | Neural networks | en_US |
dc.title | Concept embedding-based weighting scheme for biomedical text clustering and visualization | en_US |
dc.type | Article | en_US |