Context specific text mining for annotating protein interactions with experimental evidence

dc.contributor.advisorPalakal, Mathew J.
dc.contributor.authorPandit, Yogesh
dc.contributor.otherLiu, Yunlong
dc.contributor.otherLiu, Xiaowen
dc.date.accessioned2014-01-03T16:03:12Z
dc.date.available2014-01-03T16:03:12Z
dc.date.issued2014-01-03
dc.degree.date2013en_US
dc.degree.disciplineSchool of Informaticsen
dc.degree.grantorIndiana Universityen_US
dc.degree.levelM.S.en_US
dc.descriptionIndiana University-Purdue University Indianapolis (IUPUI)en_US
dc.description.abstractProteins are the building blocks in a biological system. They interact with other proteins to make unique biological phenomenon. Protein-protein interactions play a valuable role in understanding the molecular mechanisms occurring in any biological system. Protein interaction databases are a rich source on protein interaction related information. They gather large amounts of information from published literature to enrich their data. Expert curators put in most of these efforts manually. The amount of accessible and publicly available literature is growing very rapidly. Manual annotation is a time consuming process. And with the rate at which available information is growing, it cannot be dealt with only manual curation. There need to be tools to process this huge amounts of data to bring out valuable gist than can help curators proceed faster. In case of extracting protein-protein interaction evidences from literature, just a mere mention of a certain protein by look-up approaches cannot help validate the interaction. Supporting protein interaction information with experimental evidence can help this cause. In this study, we are applying machine learning based classification techniques to classify and given protein interaction related document into an interaction detection method. We use biological attributes and experimental factors, different combination of which define any particular interaction detection method. Then using predicted detection methods, proteins identified using named entity recognition techniques and decomposing the parts-of-speech composition we search for sentences with experimental evidence for a protein-protein interaction. We report an accuracy of 75.1% with a F-score of 47.6% on a dataset containing 2035 training documents and 300 test documents.en_US
dc.identifier.urihttps://hdl.handle.net/1805/3809
dc.identifier.urihttp://dx.doi.org/10.7912/C2/928
dc.language.isoen_USen_US
dc.subject.lcshProtein-protein interactions -- Databases -- Researchen_US
dc.subject.lcshData mining -- Analysisen_US
dc.subject.lcshSystems biology -- Methodologyen_US
dc.subject.lcshBioinformatics -- Information resources -- Researchen_US
dc.subject.lcshMachine learning -- Methodology -- Researchen_US
dc.subject.lcshComputational intelligence -- Research -- Methodology -- Analysisen_US
dc.subject.lcshInformation storage and retrieval systemsen_US
dc.subject.lcshNatural language processing (Computer science) -- Researchen_US
dc.subject.lcshBiology -- Data processing -- Researchen_US
dc.subject.lcshArtificial intelligence -- Medical applications -- Researchen_US
dc.titleContext specific text mining for annotating protein interactions with experimental evidenceen_US
dc.typeThesisen
Files
Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
ypandit_bioinfo_thesis.pdf
Size:
10.76 MB
Format:
Adobe Portable Document Format
Description:
Yogesh Pandit's Thesis for MS in Bioinformatics
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.88 KB
Format:
Item-specific license agreed upon to submission
Description: