Luo, XiaoShah, Setu2019-01-022019-01-022018-11-01Luo, X., & Shah, S. (2018). Concept embedding-based weighting scheme for biomedical text clustering and visualization. Applied Informatics, 5(1), 8. https://doi.org/10.1186/s40535-018-0055-82196-0089https://hdl.handle.net/1805/18062Biomedical text clustering is a text mining technique used to provide better document search, browsing, and retrieval in biomedical and clinical text collections. In this research, the document representation based on the concept embedding along with the proposed weighting scheme is explored. The concept embedding is learned through the neural networks to capture the associations between the concepts. The proposed weighting scheme makes use of the concept associations to build document vectors for clustering. We evaluate two types of concept embedding and new weighting scheme for text clustering and visualization on two different biomedical text collections. The returned results demonstrate that the concept embedding along with the new weighting scheme performs better than the baseline tf–idf for clustering and visualization. Based on the internal clustering evaluation metric-Davies–Bouldin index and the visualization, the concept embedding generated from aggregated word embedding can form well-separated clusters, whereas the intact concept embedding can better identify more clusters of specific diseases and gain better F-measure.en-USAttribution 3.0 United StatesBiomedical text clusteringVisualizationConcept embeddingNeural networksConcept embedding-based weighting scheme for biomedical text clustering and visualizationArticle