IU Indianapolis ScholarWorks :: Browsing by Subject "Natural Language Processing"

Browsing by Subject "Natural Language Processing"

Now showing 1 - 10 of 20

A Deep Language Model for Symptom Extraction From Clinical Text and its Application to Extract COVID-19 Symptoms From Social Media
(IEEE, 2022) Luo, Xiao; Gandhi, Priyanka; Storey, Susan; Huang, Kun; Biostatistics and Health Data Science, School of Medicine
Patients experience various symptoms when they have either acute or chronic diseases or undergo some treatments for diseases. Symptoms are often indicators of the severity of the disease and the need for hospitalization. Symptoms are often described in free text written as clinical notes in the Electronic Health Records (EHR) and are not integrated with other clinical factors for disease prediction and healthcare outcome management. In this research, we propose a novel deep language model to extract patient-reported symptoms from clinical text. The deep language model integrates syntactic and semantic analysis for symptom extraction and identifies the actual symptoms reported by patients and conditional or negation symptoms. The deep language model can extract both complex and straightforward symptom expressions. We used a real-world clinical notes dataset to evaluate our model and demonstrated that our model achieves superior performance compared to three other state-of-the-art symptom extraction models. We extensively analyzed our model to illustrate its effectiveness by examining each component’s contribution to the model. Finally, we applied our model on a COVID-19 tweets data set to extract COVID-19 symptoms. The results show that our model can identify all the symptoms suggested by CDC ahead of their timeline and many rare symptoms.
Assessing Information Congruence of Documented Cardiovascular Disease between Electronic Dental and Medical Records
(2018) Patel, Jay; Mowery, Danielle; Krishnan, Anand; Thyvalikakath, Thankam
Dentists are more often treating patients with Cardiovascular Diseases (CVD) in their clinics; therefore, dentists may need to alter treatment plans in the presence of CVD. However, it’s unclear to what extent patient-reported CVD information is accurately captured in Electronic Dental Records (EDRs). In this pilot study, we aimed to measure the reliability of patient-reported CVD conditions in EDRs. We assessed information congruence by comparing patients’ self-reported dental histories to their original diagnosis assigned by their medical providers in the Electronic Medical Record (EMR). To enable this comparison, we encoded patients CVD information from the free-text data of EDRs into a structured format using natural language processing (NLP). Overall, our NLP approach achieved promising performance extracting patients’ CVD-related information. We observed disagreement between self-reported EDR data and physician-diagnosed EMR data.
Bridging The Gap Between Healthcare Providers and Consumers: Extracting Features from Online Health Forum to Meet Social Needs of Patients using Network Analysis and Embedding
(2020-08) Mokashi, Maitreyi; Chakraborty, Sunandan; Jones, Josette; Zheng, Jiaping
Chronic disease patients have to face many issues during and after their treatment. A lot of these issues are either personal, professional, or social in nature. It may so happen that these issues are overlooked by the respective healthcare providers and become major obstacles in the patient’s day-to-day life and their disease management. We extract data from an online health platform that serves as a ‘safe haven’ to the patients and survivors to discuss help and coping issues. This thesis presents a novel approach that acts as the first step to include the social issues discussed by patients on online health forums which the healthcare providers need to consider in order to create holistic treatment plans. There are numerous online forums where patients share their experiences and post questions about their treatments and their subsequent side effects. We collected data from an “Online Breast Cancer Forum”. On this forum, users (patients) have created threads across many related topics and shared their experiences and questions. We connect the patients (users) with the topic in which they have posted by converting the data into a bipartite network and turn the network nodes into a high-dimensional feature space. From this feature space, we perform community detection on the node embeddings to unearth latent connections between patients and topics. We claim that these latent connections, along with the existing ones, will help to create a new knowledge base that will eventually help the healthcare providers to understand and acknowledge the non-medical related issues to a treatment, and create more adaptive and personalized plans. We performed both qualitative and quantitative analysis on the obtained embeddings to prove the superior quality of our approach and its potential to extract more information when compared to other models.
Comparing Pso-Based Clustering Over Contextual Vector Embeddings to Modern Topic Modeling
(2022-05) Miles, Samuel; Ben Miled, Zina; Salama, Paul; El-Sharkawy, Mohamed
Efficient topic modeling is needed to support applications that aim at identifying main themes from a collection of documents. In this thesis, a reduced vector embedding representation and particle swarm optimization (PSO) are combined to develop a topic modeling strategy that is able to identify representative themes from a large collection of documents. Documents are encoded using a reduced, contextual vector embedding from a general-purpose pre-trained language model (sBERT). A modified PSO algorithm (pPSO) that tracks particle fitness on a dimension-by-dimension basis is then applied to these embeddings to create clusters of related documents. The proposed methodology is demonstrated on three datasets across different domains. The first dataset consists of posts from the online health forum r/Cancer. The second dataset is a collection of NY Times abstracts and is used to compare
Deep Learning Based Methods for Automatic Extraction of Syntactic Patterns and their Application for Knowledge Discovery
(2023-12-28) Kabir, Md. Ahsanul; Hasan, Mohammad Al; Mukhopadhyay, Snehasis; Tuceryan, Mihran; Fang, Shiaofen
Semantic pairs, which consist of related entities or concepts, serve as the foundation for comprehending the meaning of language in both written and spoken forms. These pairs enable to grasp the nuances of relationships between words, phrases, or ideas, forming the basis for more advanced language tasks like entity recognition, sentiment analysis, machine translation, and question answering. They allow to infer causality, identify hierarchies, and connect ideas within a text, ultimately enhancing the depth and accuracy of automated language processing. Nevertheless, the task of extracting semantic pairs from sentences poses a significant challenge, necessitating the relevance of syntactic dependency patterns (SDPs). Thankfully, semantic relationships exhibit adherence to distinct SDPs when connecting pairs of entities. Recognizing this fact underscores the critical importance of extracting these SDPs, particularly for specific semantic relationships like hyponym-hypernym, meronym-holonym, and cause-effect associations. The automated extraction of such SDPs carries substantial advantages for various downstream applications, including entity extraction, ontology development, and question answering. Unfortunately, this pivotal facet of pattern extraction has remained relatively overlooked by researchers in the domains of natural language processing (NLP) and information retrieval. To address this gap, I introduce an attention-based supervised deep learning model, ASPER. ASPER is designed to extract SDPs that denote semantic relationships between entities within a given sentential context. I rigorously evaluate the performance of ASPER across three distinct semantic relations: hyponym-hypernym, cause-effect, and meronym-holonym, utilizing six datasets. My experimental findings demonstrate ASPER's ability to automatically identify an array of SDPs that mirror the presence of these semantic relationships within sentences, outperforming existing pattern extraction methods by a substantial margin. Second, I want to use the SDPs to extract semantic pairs from sentences. I choose to extract cause-effect entities from medical literature. This task is instrumental in compiling various causality relationships, such as those between diseases and symptoms, medications and side effects, and genes and diseases. Existing solutions excel in sentences where cause and effect phrases are straightforward, such as named entities, single-word nouns, or short noun phrases. However, in the complex landscape of medical literature, cause and effect expressions often extend over several words, stumping existing methods, resulting in incomplete extractions that provide low-quality, non-informative, and at times, conflicting information. To overcome this challenge, I introduce an innovative unsupervised method for extracting cause and effect phrases, PatternCausality tailored explicitly for medical literature. PatternCausality employs a set of cause-effect dependency patterns as templates to identify the key terms within cause and effect phrases. It then utilizes a novel phrase extraction technique to produce comprehensive and meaningful cause and effect expressions from sentences. Experiments conducted on a dataset constructed from PubMed articles reveal that PatternCausality significantly outperforms existing methods, achieving a remarkable order of magnitude improvement in the F-score metric over the best-performing alternatives. I also develop various PatternCausality variants that utilize diverse phrase extraction methods, all of which surpass existing approaches. PatternCausality and its variants exhibit notable performance improvements in extracting cause and effect entities in a domain-neutral benchmark dataset, wherein cause and effect entities are confined to single-word nouns or noun phrases of one to two words. Nevertheless, PatternCausality operates within an unsupervised framework and relies heavily on SDPs, motivating me to explore the development of a supervised approach. Although SDPs play a pivotal role in semantic relation extraction, pattern-based methodologies remain unsupervised, and the multitude of potential patterns within a language can be overwhelming. Furthermore, patterns do not consistently capture the broader context of a sentence, leading to the extraction of false-positive semantic pairs. As an illustration, consider the hyponym-hypernym pattern the w of u which can correctly extract semantic pairs for a sentence like the village of Aasu but fails to do so for the phrase the moment of impact. The root cause of this limitation lies in the pattern's inability to capture the nuanced meaning of words and phrases in a sentence and their contextual significance. These observations have spurred my exploration of a third model, DepBERT which constitutes a dependency-aware supervised transformer model. DepBERT's primary contribution lies in introducing the underlying dependency structure of sentences to a language model with the aim of enhancing token classification performance. To achieve this, I must first reframe the task of semantic pair extraction as a token classification problem. The DepBERT model can harness both the tree-like structure of dependency patterns and the masked language architecture of transformers, marking a significant milestone, as most large language models (LLMs) predominantly focus on semantics and word co-occurrence while neglecting the crucial role of dependency architecture. In summary, my overarching contributions in this thesis are threefold. First, I validate the significance of the dependency architecture within various components of sentences and publish SDPs that incorporate these dependency relationships. Subsequently, I employ these SDPs in a practical medical domain to extract vital cause-effect pairs from sentences. Finally, my third contribution distinguishes this thesis by integrating dependency relations into a deep learning model, enhancing the understanding of language and the extraction of valuable semantic associations.
Emergency Medical Service EMR-Driven Concept Extraction From Narrative Text
(2021-08) George, Susanna Serene; King, Brian; Luo, Xiao; Wyss, Catharine
Being in the midst of a pandemic with patients having minor symptoms that quickly become fatal to patients with situations like a stemi heart attack, a fatal accident injury, and so on, the importance of medical research to improve speed and efficiency in patient care, has increased. As researchers in the computer domain work hard to use automation in technology in assisting the first responders in the work they do, decreasing the cognitive load on the field crew, time taken for documentation of each patient case and improving accuracy in details of a report has been a priority. This paper presents an information extraction algorithm that custom engineers certain existing extraction techniques that work on the principles of natural language processing like metamap along with syntactic dependency parser like spacy for analyzing the sentence structure and regular expressions to recurring patterns, to retrieve patient-specific information from medical narratives. These concept value pairs automatically populates the fields of an EMR form which could be reviewed and modified manually if needed. This report can then be reused for various medical and billing purposes related to the patient.
An exploratory study using the predicate-argument structure to develop methodology for measuring semantic similarity of radiology sentences
(2013-11-12) Newsom, Eric Tyner; Jones, Josette F.; Gamache, Roland E.; Mahoui, Malika
The amount of information produced in the form of electronic free text in healthcare is increasing to levels incapable of being processed by humans for advancement of his/her professional practice. Information extraction (IE) is a sub-field of natural language processing with the goal of data reduction of unstructured free text. Pertinent to IE is an annotated corpus that frames how IE methods should create a logical expression necessary for processing meaning of text. Most annotation approaches seek to maximize meaning and knowledge by chunking sentences into phrases and mapping these phrases to a knowledge source to create a logical expression. However, these studies consistently have problems addressing semantics and none have addressed the issue of semantic similarity (or synonymy) to achieve data reduction. To achieve data reduction, a successful methodology for data reduction is dependent on a framework that can represent currently popular phrasal methods of IE but also fully represent the sentence. This study explores and reports on the benefits, problems, and requirements to using the predicate-argument statement (PAS) as the framework. A convenient sample from a prior study with ten synsets of 100 unique sentences from radiology reports deemed by domain experts to mean the same thing will be the text from which PAS structures are formed.
Extraction and Evaluation of Medication Data from Electronic Dental Records
(IOS Press, 2017) Wang, Yue; Siddiqui, Zasim; Krishnan, Anand; Patel, Jay; Thyvalikakath, Thankam; Cariology, Operative Dentistry and Dental Public Health, School of Dentistry
With an increase in the geriatric population, dental care professionals are presented with older patients who are managing their comorbidities using multiple medications. In this study, we developed a system to extract medication information from electronic dental records (EDRs) and provided patient distribution by the number of medications.
Extraction and Evaluation of Medication Data from Electronic Dental Records
(IOS Press, 2017) Wang, Yue; Siddiqui, Zasim; Krishnan, Anand; Patel, Jay; Thyvalikakath, Thankam; Cariology, Operative Dentistry and Dental Public Health, School of Dentistry
With an increase in the geriatric population, dental care professionals are presented with older patients who are managing their comorbidities using multiple medications. In this study, we developed a system to extract medication information from electronic dental records (EDRs) and provided patient distribution by the number of medications.
Extraction of pharmacokinetic evidence of drug-drug interactions from the literature
(PLoS, 2015-05-11) Kolchinsky, Artemy; Lourenço, Anália; Wu, Heng-Yi; Li, Lang; Rocha, Luis M.; Department of Medical and Molecular Genetics, IU School of Medicine
Drug-drug interaction (DDI) is a major cause of morbidity and mortality and a subject of intense scientific interest. Biomedical literature mining can aid DDI research by extracting evidence for large numbers of potential interactions from published literature and clinical databases. Though DDI is investigated in domains ranging in scale from intracellular biochemistry to human populations, literature mining has not been used to extract specific types of experimental evidence, which are reported differently for distinct experimental goals. We focus on pharmacokinetic evidence for DDI, essential for identifying causal mechanisms of putative interactions and as input for further pharmacological and pharmacoepidemiology investigations. We used manually curated corpora of PubMed abstracts and annotated sentences to evaluate the efficacy of literature mining on two tasks: first, identifying PubMed abstracts containing pharmacokinetic evidence of DDIs; second, extracting sentences containing such evidence from abstracts. We implemented a text mining pipeline and evaluated it using several linear classifiers and a variety of feature transforms. The most important textual features in the abstract and sentence classification tasks were analyzed. We also investigated the performance benefits of using features derived from PubMed metadata fields, various publicly available named entity recognizers, and pharmacokinetic dictionaries. Several classifiers performed very well in distinguishing relevant and irrelevant abstracts (reaching F1≈0.93, MCC≈0.74, iAUC≈0.99) and sentences (F1≈0.76, MCC≈0.65, iAUC≈0.83). We found that word bigram features were important for achieving optimal classifier performance and that features derived from Medical Subject Headings (MeSH) terms significantly improved abstract classification. We also found that some drug-related named entity recognition tools and dictionaries led to slight but significant improvements, especially in classification of evidence sentences. Based on our thorough analysis of classifiers and feature transforms and the high classification performance achieved, we demonstrate that literature mining can aid DDI discovery by supporting automatic extraction of specific types of experimental evidence.

Browsing by Subject "Natural Language Processing"

Results Per Page

Sort Options