A Case Study for Massive Text Mining: K Nearest Neighbor Algorithm on PubMed data

dc.contributor.authorDo, Nhan
dc.contributor.authorDundar, Murat
dc.date.accessioned2016-03-10T19:30:46Z
dc.date.available2016-03-10T19:30:46Z
dc.date.issued2015-04-17
dc.descriptionposter abstracten_US
dc.description.abstractUS National Library of Medicine (NLM) has a huge collections of millions of books, journals, and other publications relating to medical domain. NLM creates the database called MEDLINE to store and link the citations to the publications. This database allows the researchers and students to access and find medical articles easily. The public can search on MEDLINE using a database called PubMed. When the new PubMed documents become available online, the curators have to manually decide the labels for them. The process is tedious and time-consuming because there are more than 27,149 descriptor (MeSH terms). Although the curators are already using a system called MTI for MeSH terms suggestion, the performance needs to be improved. This research explores the usage of text classification to annotate new PubMed document automatically, efficiently, and with reasonable accuracy. The data is gathered from BioASQ Contest, which contains 4 millions of abstracts. The research process includes preprocess the data, reduce the feature space, classify and evaluate the result. We focus on the K nearest neighbor algorithm in this case study.en_US
dc.identifier.citationNhan Do and Murat Dundar. (2015, April 17). A Case Study for Massive Text Mining: K Nearest Neighbor Algorithm on PubMed data. Poster session presented at IUPUI Research Day 2015, Indianapolis, Indiana.en_US
dc.identifier.urihttps://hdl.handle.net/1805/8800
dc.language.isoen_USen_US
dc.publisherOffice of the Vice Chancellor for Researchen_US
dc.subjectUS National Library of Medicine (NLM)en_US
dc.subjectMEDLINEen_US
dc.subjectPubMeden_US
dc.subjecttext classificationen_US
dc.titleA Case Study for Massive Text Mining: K Nearest Neighbor Algorithm on PubMed dataen_US
dc.typePosteren_US
Files
Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Do-Case.pdf
Size:
7.44 KB
Format:
Adobe Portable Document Format
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.88 KB
Format:
Item-specific license agreed upon to submission
Description: