- Browse by Subject
Browsing by Subject "Computational linguistics"
Now showing 1 - 5 of 5
Results Per Page
Sort Options
Item Advanced natural language processing and temporal mining for clinical discovery(2015-08-17) Mehrabi, Saeed; Jones, Josette F.; Palakal, Mathew J.; Chien, Stanley Yung-Ping; Liu, Xiaowen; Schmidt, C. MaxThere has been vast and growing amount of healthcare data especially with the rapid adoption of electronic health records (EHRs) as a result of the HITECH act of 2009. It is estimated that around 80% of the clinical information resides in the unstructured narrative of an EHR. Recently, natural language processing (NLP) techniques have offered opportunities to extract information from unstructured clinical texts needed for various clinical applications. A popular method for enabling secondary uses of EHRs is information or concept extraction, a subtask of NLP that seeks to locate and classify elements within text based on the context. Extraction of clinical concepts without considering the context has many complications, including inaccurate diagnosis of patients and contamination of study cohorts. Identifying the negation status and whether a clinical concept belongs to patients or his family members are two of the challenges faced in context detection. A negation algorithm called Dependency Parser Negation (DEEPEN) has been developed in this research study by taking into account the dependency relationship between negation words and concepts within a sentence using the Stanford Dependency Parser. The study results demonstrate that DEEPEN, can reduce the number of incorrect negation assignment for patients with positive findings, and therefore improve the identification of patients with the target clinical findings in EHRs. Additionally, an NLP system consisting of section segmentation and relation discovery was developed to identify patients' family history. To assess the generalizability of the negation and family history algorithm, data from a different clinical institution was used in both algorithm evaluations.Item A nonparametric Bayesian perspective for machine learning in partially-observed settings(2014-07-31) Akova, Ferit; Dundar, Mehmet Murat; Qi, Yuan AlanRobustness and generalizability of supervised learning algorithms depend on the quality of the labeled data set in representing the real-life problem. In many real-world domains, however, we may not have full knowledge of the underlying data-generating mechanism, which may even have an evolving nature introducing new classes continually. This constitutes a partially-observed setting, where it would be impractical to obtain a labeled data set exhaustively defined by a fixed set of classes. Traditional supervised learning algorithms, assuming an exhaustive training library, would misclassify a future sample of an unobserved class with probability one, leading to an ill-defined classification problem. Our goal is to address situations where such assumption is violated by a non-exhaustive training library, which is a very realistic yet an overlooked issue in supervised learning. In this dissertation we pursue a new direction for supervised learning by defining self-adjusting models to relax the fixed model assumption imposed on classes and their distributions. We let the model adapt itself to the prospective data by dynamically adding new classes/components as data demand, which in turn gradually make the model more representative of the entire population. In this framework, we first employ suitably chosen nonparametric priors to model class distributions for observed as well as unobserved classes and then, utilize new inference methods to classify samples from observed classes and discover/model novel classes for those from unobserved classes. This thesis presents the initiating steps of an ongoing effort to address one of the most overlooked bottlenecks in supervised learning and indicates the potential for taking new perspectives in some of the most heavily studied areas of machine learning: novelty detection, online class discovery and semi-supervised learning.Item Pharmacodynamics miner : an automated extraction of pharmacodynamic drug interactions(2013-12-11) Lokhande, Hrishikesh; Li, Lang; Liu, Yunlong; Liu, XiaowenPharmacodynamics (PD) studies the relationship between drug concentration and drug effect on target sites. This field has recently gained attention as studies involving PD Drug-Drug interactions (DDI) assure discovery of multi-targeted drug agents and novel efficacious drug combinations. A PD drug combination could be synergistic, additive or antagonistic depending upon the summed effect of the drug combination at a target site. The PD literature has grown immensely and most of its knowledge is dispersed across different scientific journals, thus the manual identification of PD DDI is a challenge. In order to support an automated means to extract PD DDI, we propose Pharmacodynamics Miner (PD-Miner). PD-Miner is a text-mining tool, which is capable of identifying PD DDI from in vitro PD experiments. It is powered by two major features, i.e., collection of full text articles and in vitro PD ontology. The in vitro PD ontology currently has four classes and more than hundred subclasses; based on these classes and subclasses the full text corpus is annotated. The annotated full text corpus forms a database of articles, which can be queried based upon drug keywords and ontology subclasses. Since the ontology covers term and concept meanings, the system is capable of formulating semantic queries. PD-Miner extracts in vitro PD DDI based upon references to cell lines and cell phenotypes. The results are in the form of fragments of sentences in which important concepts are visually highlighted. To determine the accuracy of the system, we used a gold standard of 5 expert curated articles. PD-Miner identified DDI with a recall of 75% and a precision of 46.55%. Along with the development of PD Miner, we also report development of a semantically annotated in vitro PD corpus. This corpus includes term and sentence level annotations and serves as a gold standard for future text mining.Item Query Segmentation For E-Commerce Sites(2013-07-12) Gong, Xiaojing; Al Hasan, Mohammad; Fang, Shiaofen; Raje, RajeevQuery segmentation module is an integral part of Natural Language Processing which analyzes users' query and divides them into separate phrases. Published works on the query segmentation focus on the web search using Google n-gram frequencies corpus or text retrieval from relational databases. However, this module is also useful in the domain of E-Commerce for product search. In this thesis, we will discuss query segmentation in the context of the E-Commerce area. We propose a hybrid unsupervised segmentation methodology which is based on prefix tree, mutual information and relative frequency count to compute the score of query pairs and involve Wikipedia for new words recognition. Furthermore, we use two unique E-Commerce evaluation methods to quantify the accuracy of our query segmentation method.Item Semantic and phonetic similarity of verbal fluency responses in early-stage psychosis(Elsevier, 2022) Lundin, Nancy B.; Jones, Michael N.; Myers, Evan J.; Breier, Alan; Minor, Kyle S.; Psychology, School of ScienceLinguistic abnormalities can emerge early in the course of psychotic illness. Computational tools that quantify similarity of responses in standardized language-based tasks such as the verbal fluency test could efficiently characterize the nature and functional correlates of these disturbances. Participants with early-stage psychosis (n=20) and demographically matched controls without a psychiatric diagnosis (n=20) performed category and letter verbal fluency. Semantic similarity was measured via predicted context co-occurrence in a large text corpus using Word2Vec. Phonetic similarity was measured via edit distance using the VFClust tool. Responses were designated as clusters (related items) or switches (transitions to less related items) using similarity-based thresholds. Results revealed that participants with early-stage psychosis compared to controls had lower fluency scores, lower cluster-related semantic similarity, and fewer switches; mean cluster size and phonetic similarity did not differ by group. Lower fluency semantic similarity was correlated with greater speech disorganization (Communication Disturbances Index), although more strongly in controls, and correlated with poorer social functioning (Global Functioning: Social), primarily in the psychosis group. Findings suggest that search for semantically related words may be impaired soon after psychosis onset. Future work is warranted to investigate the impact of language disturbances on social functioning over the course of psychotic illness.