A Case Study for Massive Text Mining: K Nearest Neighbor Algorithm on PubMed data

Date
2015-04-17
Language
American English
Embargo Lift Date
Department
Committee Members
Degree
Degree Year
Department
Grantor
Journal Title
Journal ISSN
Volume Title
Found At
Office of the Vice Chancellor for Research
Abstract

US National Library of Medicine (NLM) has a huge collections of millions of books, journals, and other publications relating to medical domain. NLM creates the database called MEDLINE to store and link the citations to the publications. This database allows the researchers and students to access and find medical articles easily. The public can search on MEDLINE using a database called PubMed. When the new PubMed documents become available online, the curators have to manually decide the labels for them. The process is tedious and time-consuming because there are more than 27,149 descriptor (MeSH terms). Although the curators are already using a system called MTI for MeSH terms suggestion, the performance needs to be improved. This research explores the usage of text classification to annotate new PubMed document automatically, efficiently, and with reasonable accuracy. The data is gathered from BioASQ Contest, which contains 4 millions of abstracts. The research process includes preprocess the data, reduce the feature space, classify and evaluate the result. We focus on the K nearest neighbor algorithm in this case study.

Description
poster abstract
item.page.description.tableofcontents
item.page.relation.haspart
Cite As
Nhan Do and Murat Dundar. (2015, April 17). A Case Study for Massive Text Mining: K Nearest Neighbor Algorithm on PubMed data. Poster session presented at IUPUI Research Day 2015, Indianapolis, Indiana.
ISSN
Publisher
Series/Report
Sponsorship
Major
Extent
Identifier
Relation
Journal
Rights
Source
Alternative Title
Type
Poster
Number
Volume
Conference Dates
Conference Host
Conference Location
Conference Name
Conference Panel
Conference Secretariat Location
Version
Full Text Available at
This item is under embargo {{howLong}}