Extracting Symptoms from Narrative Text using Artificial Intelligence

dc.contributor.advisorZou, Xukai
dc.contributor.authorGandhi, Priyanka
dc.contributor.otherLuo, Xiao
dc.contributor.otherXia, Yuni
dc.date.accessioned2021-01-05T18:36:06Z
dc.date.available2021-01-05T18:36:06Z
dc.date.issued2020-12
dc.degree.date2020en_US
dc.degree.disciplineComputer & Information Science
dc.degree.grantorPurdue Universityen_US
dc.degree.levelM.S.en_US
dc.descriptionIndiana University-Purdue University Indianapolis (IUPUI)en_US
dc.description.abstractElectronic health records collect an enormous amount of data about patients. However, the information about the patient’s illness is stored in progress notes that are in an un- structured format. It is difficult for humans to annotate symptoms listed in the free text. Recently, researchers have explored the advancements of deep learning can be applied to pro- cess biomedical data. The information in the text can be extracted with the help of natural language processing. The research presented in this thesis aims at automating the process of symptom extraction. The proposed methods use pre-trained word embeddings such as BioWord2Vec, BERT, and BioBERT to generate vectors of the words based on semantics and syntactic structure of sentences. BioWord2Vec embeddings are fed into a BiLSTM neural network with a CRF layer to capture the dependencies between the co-related terms in the sentence. The pre-trained BERT and BioBERT embeddings are fed into the BERT model with a CRF layer to analyze the output tags of neighboring tokens. The research shows that with the help of the CRF layer in neural network models, longer phrases of symptoms can be extracted from the text. The proposed models are compared with the UMLS Metamap tool that uses various sources to categorize the terms in the text to different semantic types and Stanford CoreNLP, a dependency parser, that analyses syntactic relations in the sentence to extract information. The performance of the models is analyzed by using strict, relaxed, and n-gram evaluation schemes. The results show BioBERT with a CRF layer can extract the majority of the human-labeled symptoms. Furthermore, the model is used to extract symptoms from COVID-19 tweets. The model was able to extract symptoms listed by CDC as well as new symptoms.en_US
dc.identifier.urihttps://hdl.handle.net/1805/24759
dc.identifier.urihttp://dx.doi.org/10.7912/C2/2377
dc.language.isoen_USen_US
dc.subjectArtificial Intelligenceen_US
dc.subjectNeural Networken_US
dc.subjectMachine Learningen_US
dc.subjectMedical Dataseten_US
dc.titleExtracting Symptoms from Narrative Text using Artificial Intelligenceen_US
dc.typeThesisen
thesis.degree.disciplineComputer & Information Scienceen
Files
Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Thesis_Priyanka_Gandhi.pdf
Size:
4.26 MB
Format:
Adobe Portable Document Format
Description:
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.99 KB
Format:
Item-specific license agreed upon to submission
Description: