Comparison of Deep Learning based Concept Representations for Biomedical Document Clustering
Date
Authors
Language
Embargo Lift Date
Committee Members
Degree
Degree Year
Department
Grantor
Journal Title
Journal ISSN
Volume Title
Found At
Abstract
In this research, document representations based on distributed representations of the concepts along with new weighting schemes for the documents are explored. The baseline weighting scheme is the traditional Term Frequency-Inverse Document Frequency (TF-IDF) of the concepts, whereas, the other two newly proposed ones consider both local content using the TF-IDF and associations between concepts. The distributed representations of the concepts are measured using a deep learning algorithm. The evaluation of the proposed document representations is based on the k-means clustering results. The results show that document representation based on TF-IDF in combination with the term based distributed representations for concepts outperforms the other two based on the returned evaluation metrics - F1-measure (80.21%) and Purity (77.1%).