- Browse by Author
Browsing by Author "Palakal, Mathew"
Now showing 1 - 10 of 20
Results Per Page
Sort Options
Item Bridging Text Mining and Bayesian Networks(2011-03-09) Raghuram, Sandeep Mudabail; Xia, Yuni; Palakal, Mathew; Zou, Xukai, 1963-After the initial network is constructed using expert’s knowledge of the domain, Bayesian networks need to be updated as and when new data is observed. Literature mining is a very important source of this new data. In this work, we explore what kind of data needs to be extracted with the view to update Bayesian Networks, existing technologies which can be useful in achieving some of the goals and what research is required to accomplish the remaining requirements. This thesis specifically deals with utilizing causal associations and experimental results which can be obtained from literature mining. However, these associations and numerical results cannot be directly integrated with the Bayesian network. The source of the literature and the perceived quality of research needs to be factored into the process of integration, just like a human, reading the literature, would. This thesis presents a general methodology for updating a Bayesian Network with the mined data. This methodology consists of solutions to some of the issues surrounding the task of integrating the causal associations with the Bayesian Network and demonstrates the idea with a semiautomated software system.Item Bridging the Phenotype-Genotype Gap for disease prognosis(Office of the Vice Chancellor for Research, 2013-04-05) Palakal, Mathew; Pradham, Meeta; Sakhare, ShrutiA well-known question we are trying to solve since past two decades is “What is the relationship between genotypes and phenotypes?”. Currently, methods such as Genome Wide Association Studies (GWAS) and Gene Regulatory Networks (GRNs) are used to find these phenotype and genotype relationships using statistics and molecular biology respectively. These studies mainly focus on studying limited phenotypes for direct mapping. However it has been reported that disease traits are outcome of many interdependent changes in phenotype. Our study aims to use the extensive clinical and genotype data from publicly available databases to study this interdependency of clinical outcomes and the corresponding changes at gene expression pattern. The present work of understanding genotype-phenotype relationship across different stages is designed based on the available TCGA data for breast cancer. The clinical features were identified and classified based on the laboratory and other clinical parameters. We selected 60 phenotypes based on their importance reported in literature and these were clustered for their significance for cancer prognosis and their expression at different stages. Multivariate statistical analysis is performed for the outliers from the clusters to identify the interdependency of their expression. An expression profile of these outliers is obtained based on the analysis performed. The analysis shows the significant phenotypes expressed in different stages of breast cancer. Some of these significant phenotypes are the ones, previously reported for breast cancer prognosis. However, the clustering analysis identified new phenotypes that may play a significant role in breast cancer prognosis. Correlation study for these parameters can then identify relational expression of multiple clinical traits. Following this study, these genotype features will be analyzed for their SNP, CNV variants for these parameters to bridge the genotype-phenotype gap. By successfully identifying the molecular changes at gene level for such phenotypic diversity of clinical traits it can be made possible to predict the onset of disease at an early stage. Current methodology can then be extended for other disease studies.Item Decision Support System For Geriatric Care(Office of the Vice Chancellor for Research, 2010-04-09) Palakal, Mathew; Pandit, Yogesh; Jones, Josette; Xia, Yuni; Bandos, Jean; Geesaman, Jerry; Pecenka, Dave; Tinsley, EricGeriatrics is a branch in medicine that focuses on the healthcare of the elderly. We propose to build a decision support system for the elderly care based on a knowledgebase system that incorporates best practices that are reported in the literature. A Bayesian network model is then used for decision support for the geriatric care tool that we develop.Item DEEPEN: A negation detection system for clinical text incorporating dependency relation into NegEx(Elsevier, 2015-04) Mehrabi, Saeed; Krishnan, Krishnan; Sohn, Sunghwan; Roch, Alexandra M; Schmidt, Heidi; Kesterson, Joe; Beesley, Chris; Dexter, Paul; Schmidt, C. Max; Liu, Hongfang; Palakal, Mathew; Surgery, School of MedicineIn Electronic Health Records (EHRs), much of valuable information regarding patients’ conditions is embedded in free text format. Natural language processing (NLP) techniques have been developed to extract clinical information from free text. One challenge faced in clinical NLP is that the meaning of clinical entities is heavily affected by modifiers such as negation. A negation detection algorithm, NegEx, applies a simplistic approach that has been shown to be powerful in clinical NLP. However, due to the failure to consider the contextual relationship between words within a sentence, NegEx fails to correctly capture the negation status of concepts in complex sentences. Incorrect negation assignment could cause inaccurate diagnosis of patients’ condition or contaminated study cohorts. We developed a negation algorithm called DEEPEN to decrease NegEx’s false positives by taking into account the dependency relationship between negation words and concepts within a sentence using Stanford dependency parser. The system was developed and tested using EHR data from Indiana University (IU) and it was further evaluated on Mayo Clinic dataset to assess its generalizability. The evaluation results demonstrate DEEPEN, which incorporates dependency parsing into NegEx, can reduce the number of incorrect negation assignment for patients with positive findings, and therefore improve the identification of patients with the target clinical findings in EHRs.Item Development of an Automated Mapping Tool to Transform Nursing Narrative Information into Quantifiable Nursing Data(Office of the Vice Chancellor for Research, 2014-04-11) Lee, Mikyoung; Patil, Sachin; Palakal, MathewBackground: Inspecting the effectiveness of health care has been a central focus of health care professionals challenged by a system with aggressive cost constraints and increasing demands for quality of care. This focus has highlighted the importance of having health care data and facilitated the use of large data sets. It is crucial that nurses clearly verify the economic and clinical values of nursing interventions for the improvement of patient outcomes. However, rarely has effectiveness of nursing care in hospitals been demonstrated due to nurse scientists’ inability to electronically obtain valid and comparable nursing data. The importance of “computable” nursing data and databases have been long recognized and led to the development of standardized nursing terminologies (SNTs) to represent nursing interventions and outcomes. Yet, a majority of nursing information systems in hospitals is still using nurses’ free-text records to document care processes and patient outcomes. Free-text records, which may produce rich information on nursing phenomena yet incomputable, have been of limited use for generating nursing information and knowledge. Therefore, the study aimed at the development of an automated mapping tool to extract and transform the narrative nursing notes to quantifiable data in SNTs. Method: The nursing narrative notes were collected from a retrospective nursing record review of patients who were admitted to a community hospital with the diagnosis of Septicemia. The Nursing Interventions Classification and the Nursing Outcome Classification were the SNTs used for mapping. The automated mapping tool was developed using natural language processing; the Graphic User Interface was designed using NetBeans IDE and Perl programming language. Tokenizing each sentence to identify single word term candidates, stemming them, lexical collocations to coordinate the words into meaningful information (phrases/sentences), and mapping them into labels and indicators of SNTs were accomplished by using Regular Expressions. The validation of the tool was completed by comparing the result from the use of the tool with the result from the manual mapping by 2 nursing experts, which was considered as the gold standard. Results: The interface features of the automated mapping tool included data entry options (i.e., browse/upload files or type-in each nursing narrative sentence), mapping sources to select NIC and NOC dictionaries, their domains and classes by their hierarchical classification structure, and output options (i.e., nursing representation with the mapped terms, Frequency of the mapped terms). A total of 25588 words from nursing narrative records of 14 patients were used. On average 52 parsed phrases or sentences per nursing record were mapped. In total, 768 labels of NIC and 4733 indicators of NOC, including the duplicates. Compared with the manually mapped terms (the gold standard), the automated mapping tool showed the accuracy rates ((True positive + True negative)/The Overall mapped), 80.6% with NIC and 74.8% with NOC. The most frequently mapped descriptors of NIC were ‘Report changes in patient status’ under the label of ‘Physician Support (7710).’ The most commonly mapped indicators of NOC were ‘Coughing (041019)’ under the label of ‘Respiratory Status: Airway Patency.’ Nurses were likely to document their observations of patient status than what nursing interventions were provided. Conclusion/Implications: The new automated mapping tool showed high performance at the initial stage. The validation of the tool will be continuously tested with more nursing narratives data. It is expected that the tool will be useful for transforming nursing information with SNTs into quantifiable and comparable data, which consequently can be used for nursing effectiveness research. It can be used for outcomes analyses, regulatory quality report generation, and text analysis for finding appropriate nursing literature and capturing nursing concepts in qualitative research. The study findings can also contribute to the development and refinement of SNTs to more accurately represent nursing practice.Item A Dynamic, User-centric Big Data Analytics Framework for Genome Data(Office of the Vice Chancellor for Research, 2015-04-17) Ravishankar, Shalini; Pradhan, Meeta; Palakal, MathewThe cost to sequence DNA today has reduced from $100million to mere over $1000 and this has significantly increased the generation of genomic data multifold. However, analysis of such large data requires meeting user needs and computational challenges. There are different tools that exist to process the sequenced DNA information for alignment and research. These tools are made adaptive to work in a big data processing environment like Hadoop. However, the analysis of such sequence data is dependent on user specific needs, and hence, a unique data analysis pipeline is needed for each user. We propose a barcode driven technology to instruct a Hadoop-based big data analytics system that would allow the user to select the necessary tools to process the input genome data file. The proposed framework can dynamically generate customized barcodes for each user based on the user’s data analysis need and a pipeline is created and driven by the barcode. This approach will revolutionize the way NGS data analytics pipelines are being setup by the user. This new method will provide the user with a seamless way to analyze the data. The time taken to process a genomic file was significantly reduced from 2 hours on a traditional Linux server to just 3.81 minutes on Hadoop. Our results indicate that a barcode-based approach will enable the user to customize NGS data analysis in a very efficient manner.Item Finding the Patient’s Voice Using Big Data: Analysis of Users’ Health-Related Concerns in the ChaCha Question-and-Answer Service (2009–2012)(JMIR, 2016) Priest, Chad; Knopf, Amelia; Groves, Doyle; Carpenter, Janet S.; Furrey, Christopher; Krishnan, Anand; Miller, Wendy R.; Otte, Julie L.; Palakal, Mathew; Wiehe, Sarah E.; Wilson, Jeffrey S.; IU School of NursingBackground: The development of effective health care and public health interventions requires a comprehensive understanding of the perceptions, concerns, and stated needs of health care consumers and the public at large. Big datasets from social media and question-and-answer services provide insight into the public’s health concerns and priorities without the financial, temporal, and spatial encumbrances of more traditional community-engagement methods and may prove a useful starting point for public-engagement health research (infodemiology). Objective: The objective of our study was to describe user characteristics and health-related queries of the ChaCha question-and-answer platform, and discuss how these data may be used to better understand the perceptions, concerns, and stated needs of health care consumers and the public at large. Methods: We conducted a retrospective automated textual analysis of anonymous user-generated queries submitted to ChaCha between January 2009 and November 2012. A total of 2.004 billion queries were read, of which 3.50% (70,083,796/2,004,243,249) were missing 1 or more data fields, leaving 1.934 billion complete lines of data for these analyses. Results: Males and females submitted roughly equal numbers of health queries, but content differed by sex. Questions from females predominantly focused on pregnancy, menstruation, and vaginal health. Questions from males predominantly focused on body image, drug use, and sexuality. Adolescents aged 12–19 years submitted more queries than any other age group. Their queries were largely centered on sexual and reproductive health, and pregnancy in particular. Conclusions: The private nature of the ChaCha service provided a perfect environment for maximum frankness among users, especially among adolescents posing sensitive health questions. Adolescents’ sexual health queries reveal knowledge gaps with serious, lifelong consequences. The nature of questions to the service provides opportunities for rapid understanding of health concerns and may lead to development of more effective tailored interventions. [J Med Internet Res 2016;18(3):e44]Item GCell A Sub-Cellular Localization ToolDhaval, Rakesh; Palakal, MathewThe aim of this thesis is to develop a biological database mining tool that incorporates mining of various publicly available heterogeneous databases and provides researchers with a reporting and visualization tool for sub-cellular localization of genes and proteins. Although there is little conservation of the primary structure, the general physiochemical properties are conserved to some extent among proteins that share sub-cellular location. Hence, the function of a protein is closely correlated with its sub-cellular location. Data in the field of genomics and proteomics are detailed, complex, and voluminous and distributed in heterogeneous databases. Most of the earlier work in information extraction from biological databases focused on database integration using wrapper techniques. However, little work has been done to mine specific data leading to the identification of pathway information and evolutionary relationship from heterogeneous biological databases. The need to develop an interactive information visualization tool leading to biological pathway detection for genes by using controlled vocabulary and various publicly available biological databases has led to the concept and implementation of GCell. This system provides a researcher to move from raw text data at a broader level to a much more detailed view of pathways representing complex biological interactions.Item IDENTIFICATION OF CAUSE AND EFFECT IN CAUSAL SENTENCES OF GERIATRIC CARE DOMAIN USING CONDITIONAL RANDOM(Office of the Vice Chancellor for Research, 2012-04-13) Mehrabi, Saeed; Krishnan, Anand; Palakal, MathewEvent extraction is a key step in many text mining applications. Identified events can be used in various applications such as question-answering systems, information extraction, summarization or building the knowledge base of a clinical decision support system. In this study we used PubMed abstracts of Geriatric care domain that were manually categorized into 42 different subdomains and further divided into causal and non-causal sentences by three domain experts. There are a total of 19,677 sentences in the collected abstracts from PubMed, out of which 2,856 sentences were selected and manually annotated with cause and effect events. We used conditional random fields (CRFs) that are statistical algorithms used to sequentially tag each word in a sentence as a cause or effect event based on some input variables or features. Features used in this study are words, words categories (lowercase, uppercase, mixed of letter and digits, etc.), affixes, part of speech and phrase chunks such as noun or verb phrase. For every word, a window of features before and after each word was also considered. We tested window of size, one to five meaning one to five features before and after each word was included as the input variables. The CRF algorithm was trained and tested on data set with 2,520 sentences in training set, 252 sentences in validation and 84 sentences in test set. Window of four features before and after each word had the best performance with 75.1% accuracy and F-measure of 85% with 84.6% precision and 87% recall.Item Identification of Immuno-Oncology Crosstalk Pathways in Lung Adenocarcinoma(Office of the Vice Chancellor for Research, 2016-04-08) Sudha, Parvathi; Pradhan, Meeta; Palakal, MathewIdentifying dysregulated pathways from the high throughput data for biomarker detection is the rate limiting step in the complex diseases cure. Pathways don’t perform alone; they interact with each other through the overlapping genes. This phenomenon is known as crosstalk of pathways. The aim of the study is develop a methodology to find the highly interacting (cross-talk) immuneoncological pathways and their drug-gene-pathway modules which can be further validated invivo using Lung Adenocarcinoma (LUAD) as a case study. The reference pathway cross-talk matrix is built using the KEGG Knowledgebase, which consists of the 302 KEGG pathways associated with 6996 genes. The LUAD gene expression data available in The Cancer Genome Atlas (TCGA) is used for the study. The data of 32 patients was used in the study and of these, 9 patients were treated with immunotherapy drugs. A set of 3018 significant genes associated with 296 pathways [C.I. =95%, p-value <=0.05] are identified in this dataset, and a disease crosstalk matrix is constructed. Each cell in the matrix gives the cross-talk score of the pathways computed using the formula: ∩ ∪ . The interaction among the significant genes (3018 genes) in the crosstalk pathways were identified using the BioGrid physical gene-gene interaction map and a gene interaction network (10102 interaction) is generated. The significant genes in the network are annotated to their drugs as given in the clinical data of TCGA. The drug-genepathway modules of LUAD are identified using Seed-Based-Network Propagation Algorithm. These modules give the profile of the highest cross-talk pathways of LUAD that can be studied further for alternative drug targets. The study identified T-cell receptor signaling pathway and B cell receptor signaling pathway of LUAD have high crosstalk scores with Erbb Signaling pathway (18.67, 15.15) Vegf signaling pathway (17.77, 22.45); Osteoclast differentiation (16.35, 14.89).