- Browse by Author
Browsing by Author "Wu, Heng-Yi"
Now showing 1 - 8 of 8
Results Per Page
Sort Options
Item Analyzing Patterns of Literature-Based Phenotyping Definitions for Text Mining Applications(IEEE, 2018-06) Binkheder, Samar; Wu, Heng-Yi; Quinney, Sara; Li, Lang; BioHealth Informatics, School of Informatics and ComputingPhenotyping definitions are widely used in observational studies that utilize population data from Electronic Health Records (EHRs). Biomedical text mining supports biomedical knowledge discovery. Therefore, we believe that mining phenotyping definitions from the literature can support EHR-based clinical research. However, information about these definitions presented in the literature is inconsistent, diverse, and unknown, especially for text mining usage. Therefore, we aim to analyze patterns of phenotyping definitions as a first step toward developing a text mining application to improve phenotype definition. A set random of observational studies was used for this analysis. Term frequency-inverse document frequency (TF-IDF) and Term Frequency (TF) were used to rank the terms in the 3958 sentences. Finally, we present preliminary results analyzing phenotyping definitions patterns.Item Extraction of pharmacokinetic evidence of drug-drug interactions from the literature(PLoS, 2015-05-11) Kolchinsky, Artemy; Lourenço, Anália; Wu, Heng-Yi; Li, Lang; Rocha, Luis M.; Department of Medical and Molecular Genetics, IU School of MedicineDrug-drug interaction (DDI) is a major cause of morbidity and mortality and a subject of intense scientific interest. Biomedical literature mining can aid DDI research by extracting evidence for large numbers of potential interactions from published literature and clinical databases. Though DDI is investigated in domains ranging in scale from intracellular biochemistry to human populations, literature mining has not been used to extract specific types of experimental evidence, which are reported differently for distinct experimental goals. We focus on pharmacokinetic evidence for DDI, essential for identifying causal mechanisms of putative interactions and as input for further pharmacological and pharmacoepidemiology investigations. We used manually curated corpora of PubMed abstracts and annotated sentences to evaluate the efficacy of literature mining on two tasks: first, identifying PubMed abstracts containing pharmacokinetic evidence of DDIs; second, extracting sentences containing such evidence from abstracts. We implemented a text mining pipeline and evaluated it using several linear classifiers and a variety of feature transforms. The most important textual features in the abstract and sentence classification tasks were analyzed. We also investigated the performance benefits of using features derived from PubMed metadata fields, various publicly available named entity recognizers, and pharmacokinetic dictionaries. Several classifiers performed very well in distinguishing relevant and irrelevant abstracts (reaching F1≈0.93, MCC≈0.74, iAUC≈0.99) and sentences (F1≈0.76, MCC≈0.65, iAUC≈0.83). We found that word bigram features were important for achieving optimal classifier performance and that features derived from Medical Subject Headings (MeSH) terms significantly improved abstract classification. We also found that some drug-related named entity recognition tools and dictionaries led to slight but significant improvements, especially in classification of evidence sentences. Based on our thorough analysis of classifiers and feature transforms and the high classification performance achieved, we demonstrate that literature mining can aid DDI discovery by supporting automatic extraction of specific types of experimental evidence.Item Leveraging syntactic and semantic graph kernels to extract pharmacokinetic drug drug interactions from biomedical literature(BioMed Central, 2016-08-26) Zhang, Yaoyun; Wu, Heng-Yi; Xu, Jun; Wang, Jingqi; Soysal, Ergin; Li, Lang; Xu, Hua; Department of Medicine, IU School of MedicineBACKGROUND: Information about drug-drug interactions (DDIs) supported by scientific evidence is crucial for establishing computational knowledge bases for applications like pharmacovigilance. Since new reports of DDIs are rapidly accumulating in the scientific literature, text-mining techniques for automatic DDI extraction are critical. We propose a novel approach for automated pharmacokinetic (PK) DDI detection that incorporates syntactic and semantic information into graph kernels, to address the problem of sparseness associated with syntactic-structural approaches. First, we used a novel all-path graph kernel using shallow semantic representation of sentences. Next, we statistically integrated fine-granular semantic classes into the dependency and shallow semantic graphs. RESULTS: When evaluated on the PK DDI corpus, our approach significantly outperformed the original all-path graph kernel that is based on dependency structure. Our system that combined dependency graph kernel with semantic classes achieved the best F-scores of 81.94 % for in vivo PK DDIs and 69.34 % for in vitro PK DDIs, respectively. Further, combining shallow semantic graph kernel with semantic classes achieved the highest precisions of 84.88 % for in vivo PK DDIs and 74.83 % for in vitro PK DDIs, respectively. CONCLUSIONS: We presented a graph kernel based approach to combine syntactic and semantic information for extracting pharmacokinetic DDIs from Biomedical Literature. Experimental results showed that our proposed approach could extract PK DDIs from literature effectively, which significantly enhanced the performance of the original all-path graph kernel based on dependency structure.Item PhenoDEF: a corpus for annotating sentences with information of phenotype definitions in biomedical literature(Springer, 2022) Binkheder, Samar; Wu, Heng-Yi; Quinney, Sara K.; Zhang, Shijun; Zitu, Md. Muntasir; Chiang, Chien-Wei; Wang, Lei; Jones, Josette; Li, Lang; BioHealth Informatics, School of Informatics and ComputingBackground Adverse events induced by drug-drug interactions are a major concern in the United States. Current research is moving toward using electronic health record (EHR) data, including for adverse drug events discovery. One of the first steps in EHR-based studies is to define a phenotype for establishing a cohort of patients. However, phenotype definitions are not readily available for all phenotypes. One of the first steps of developing automated text mining tools is building a corpus. Therefore, this study aimed to develop annotation guidelines and a gold standard corpus to facilitate building future automated approaches for mining phenotype definitions contained in the literature. Furthermore, our aim is to improve the understanding of how these published phenotype definitions are presented in the literature and how we annotate them for future text mining tasks. Results Two annotators manually annotated the corpus on a sentence-level for the presence of evidence for phenotype definitions. Three major categories (inclusion, intermediate, and exclusion) with a total of ten dimensions were proposed characterizing major contextual patterns and cues for presenting phenotype definitions in published literature. The developed annotation guidelines were used to annotate the corpus that contained 3971 sentences: 1923 out of 3971 (48.4%) for the inclusion category, 1851 out of 3971 (46.6%) for the intermediate category, and 2273 out of 3971 (57.2%) for exclusion category. The highest number of annotated sentences was 1449 out of 3971 (36.5%) for the “Biomedical & Procedure” dimension. The lowest number of annotated sentences was 49 out of 3971 (1.2%) for “The use of NLP”. The overall percent inter-annotator agreement was 97.8%. Percent and Kappa statistics also showed high inter-annotator agreement across all dimensions. Conclusions The corpus and annotation guidelines can serve as a foundational informatics approach for annotating and mining phenotype definitions in literature, and can be used later for text mining applications.Item Text mining for drug-drug interaction(Springer-Verlag, 2014) Wu, Heng-Yi; Chiang, Chien-Wei; Li, Lang; Department of Medicine, IU School of MedicineIn order to understand the mechanisms of drug-drug interaction (DDI), the study of pharmacokinetics (PK), pharmacodynamics (PD), and pharmacogenetics (PG) data are significant. In recent years, drug PK parameters, drug interaction parameters, and PG data have been unevenly collected in different databases and published extensively in literature. Also the lack of an appropriate PK ontology and a well-annotated PK corpus, which provide the background knowledge and the criteria of determining DDI, respectively, lead to the difficulty of developing DDI text mining tools for PK data collection from the literature and data integration from multiple databases.To conquer the issues, we constructed a comprehensive pharmacokinetics ontology. It includes all aspects of in vitro pharmacokinetics experiments, in vivo pharmacokinetics studies, as well as drug metabolism and transportation enzymes. Using our pharmacokinetics ontology, a PK corpus was constructed to present four classes of pharmacokinetics abstracts: in vivo pharmacokinetics studies, in vivo pharmacogenetic studies, in vivo drug interaction studies, and in vitro drug interaction studies. A novel hierarchical three-level annotation scheme was proposed and implemented to tag key terms, drug interaction sentences, and drug interaction pairs. The utility of the pharmacokinetics ontology was demonstrated by annotating three pharmacokinetics studies; and the utility of the PK corpus was demonstrated by a drug interaction extraction text mining analysis.The pharmacokinetics ontology annotates both in vitro pharmacokinetics experiments and in vivo pharmacokinetics studies. The PK corpus is a highly valuable resource for the text mining of pharmacokinetics parameters and drug interactions.Item Translational Biomedical Informatics and Pharmacometrics Approaches in the Drug Interactions Research(Wiley, 2018) Zhang, Pengyue; Wu, Heng-Yi; Chiang, Chien-Wei; Binkheder, Samar; Wang, Xueying; Zeng, Donglin; Quinney, Sara K.; Li, Lang; BioHealth Informatics, School of Informatics and ComputingDrug interaction is a leading cause of adverse drug events and a major obstacle for current clinical practice. Pharmacovigilance data mining, pharmacokinetic modeling, and text mining are computation and informatic tools on integrating drug interaction knowledge and generating drug interaction hypothesis. We provide a comprehensive overview of these translational biomedical informatics methodologies with related databases. We hope this review illustrates the complementary nature of these informatic approaches and facilitates the translational drug interaction research.Item Translational drug interaction study using text mining technology(2017-08-15) Wu, Heng-Yi; Jones, Josette; Li, Lang; Palakal, Mathew; Wu, HuanmeiDrug-Drug Interaction (DDI) is one of the major causes of adverse drug reaction (ADR) and has been demonstrated to threat public health. It causes an estimated 195,000 hospitalizations and 74,000 emergency room visits each year in the USA alone. Current DDI research aims to investigate different scopes of drug interactions: molecular level of pharmacogenetics interaction (PG), pharmacokinetics interaction (PK), and clinical pharmacodynamics consequences (PD). All three types of experiments are important, but they are playing different roles for DDI research. As diverse disciplines and varied studies are involved, interaction evidence is often not available cross all three types of evidence, which create knowledge gaps and these gaps hinder both DDI and pharmacogenetics research. In this dissertation, we proposed to distinguish the three types of DDI evidence (in vitro PK, in vivo PK, and clinical PD studies) and identify all knowledge gaps in experimental evidence for them. This is a collective intelligence effort, whereby a text mining tool will be developed for the large-scale mining and analysis of drug-interaction information such that it can be applied to retrieve, categorize, and extract the information of DDI from published literature available on PubMed. To this end, three tasks will be done in this research work: First, the needed lexica, ontology, and corpora for distinguishing three different types of studies were prepared. Despite the lexica prepared in this work, a comprehensive dictionary for drug metabolites or reaction, which is critical to in vitro PK study, is still lacking in pubic databases. Thus, second, a name entity recognition tool will be proposed to identify drug metabolites and reaction in free text. Third, text mining tools for retrieving DDI articles and extracting DDI evidence are developed. In this work, the knowledge gaps cross all three types of DDI evidence can be identified and the gaps between knowledge of molecular mechanisms underlying DDI and their clinical consequences can be closed with the result of DDI prediction using the retrieved drug gene interaction information such that we can exemplify how the tools and methods can advance DDI pharmacogenetics research.Item Using machine learning algorithms to identify genes essential for cell survival(BMC, 2017-10-03) Philips, Santosh; Wu, Heng-Yi; Li, Lang; Medical and Molecular Genetics, School of MedicineBackground With the explosion of data comes a proportional opportunity to identify novel knowledge with the potential for application in targeted therapies. In spite of this huge amounts of data, the solutions to treating complex disease is elusive. One reason being that these diseases are driven by a network of genes that need to be targeted in order to understand and treat them effectively. Part of the solution lies in mining and integrating information from various disciplines. Here we propose a machine learning method to mining through publicly available literature on RNA interference with the goal of identifying genes essential for cell survival. Results A total of 32,164 RNA interference abstracts were identified from 10.5 million pubmed abstracts (2001 - 2015). These abstracts spanned over 1467 cancer cell lines and 4373 genes representing a total of 25,891 cell gene associations. Among the 1467 cell lines 88% of them had at least 1 or up to 25 genes studied in a given cell line. Among the 4373 genes 96% of them were studied in at least 1 or up to 25 different cell lines. Conclusions Identifying genes that are crucial for cell survival can be a critical piece of information especially in treating complex diseases, such as cancer. The efficacy of a therapeutic intervention is multifactorial in nature and in many cases the source of therapeutic disruption could be from an unsuspected source. Machine learning algorithms helps to narrow down the search and provides information about essential genes in different cancer types. It also provides the building blocks to generate a network of interconnected genes and processes. The information thus gained can be used to generate hypothesis which can be experimentally validated to improve our understanding of what triggers and maintains the growth of cancerous cells. Electronic supplementary material The online version of this article (10.1186/s12859-017-1799-1) contains supplementary material, which is available to authorized users.