- Browse by Author
Browsing by Author "Mahoui, Malika"
Now showing 1 - 10 of 14
Results Per Page
Sort Options
Item Advancing Toxicology-Based Cancer Risk Assessment with Informatics(2010-05-03T19:38:33Z) Bercu, Joel P.; Mahoui, Malika; Romero, Pedro R.; Stevens, James L.; Jones, Josette F.; Palakal, Mathew J.Since exposure to carcinogens can occur in the environment from various point sources, cancer risk assessment attempts to define and limit potential exposure such that the risk of developing cancer is negligible. While cancer risk assessment is widely used with certain methodologies well accepted in the scientific literature and regulatory guidances, there are still gaps which increase uncertainties when assessing risk including: (1) mixtures of genotoxins, (2) genotoxic metabolites, and (3) nongenotoxic carcinogens. An in silico model was developed to predict the cancer risk of a genotoxin which improved methodology for a single compound and mixtures. Monte Carlo simulations performed with a carcinogenicity potency database to estimate the overall carcinogenic risk of a mixture of genotoxic compounds showed that structural similarity would not likely increase the overall cancer risk. A cancer risk model was developed for genotoxic metabolites using excretion material in both animals and humans to determine the probability not exceeding a 1 in 100,000 excess cancer risk. Two model nongenotoxic compounds (fenofibrate and methapyraline) were tested in short-term microarray studies to develop a framework for cancer risk assessment. It was determined that a threshold for potential key events could be derived using benchmark dose analysis in combination with well developed ontologies (Kegg/GO), which were at or below measured tumorigenic and precursor events. In conclusion, informatics was effective in advancing toxicology-based cancer risk assessment using databases and predictive techniques which fill critical gaps in its methodology.Item A Biological and Bioinformatics Ontology for Service Discovery and Data Integration(2006-07-26T15:44:50Z) Dippold, Mindi M.; Mahoui, MalikaThis project addresses the need for an increased expressivity and robustness of ontologies already supporting BACIIS and SIBIOS, two systems for data and service integration in the life sciences. The previous ontology solutions as global schema and facilitator of service discovery sustained the purposes for which they were built to provide, but were in need of updating in order to keep up with more recent standards in ontology descriptions and utilization as well as increase the breadth of the domain and expressivity of the content. Thus, several tasks were undertaken to increase the worth of the system ontologies. These include an upgrade to a more recent ontology language standard, increased domain coverage, and increased expressivity via additions of relationships and hierarchies within the ontology as well as increased ease of maintenance by a distributed design.Item DACS-DB: An Annotation and Dissemination Model for Disease Associated Cytokine SNPs(2011-10-19) Bhushan, Sushant; Perumal, Narayanan B.; Mahoui, Malika; Skaar, ToddCytokines mediate crucial functions in innate and adaptive immunity. They play valuable roles in immune cell growth and lineage specification, and are associated with various disease pathologies. A large number of low, medium and high throughput studies have implicated association of single nucleotide polymorphisms (SNPs) in cytokine genes with diseases. A preponderance of such experiments have not shown any causality of an identified SNP to the associated disease. Instead, they have identified statistically significant SNP-disease associations; hence, it is likely that some of these cytokine gene variants may directly or indirectly cause the disease phenotype(s). To fill this knowledge gap and derive study parameters for cytokine SNP-disease causality relationships, we have designed and developed the Disease Associated Cytokine SNP Database (DACS-DB). DACS-DB has data on 456 cytokine genes, approximately 61,000 SNPs, and 891 SNP-associated diseases. In DACS-DB, among other attributes, we present functional annotation, and heterozygosity allele frequency for the SNPs, and literature-validated SNP association for diseases. Users of the DB can run queries such as the ones to find disease-associated SNPs in a cytokine gene, and all the SNPs involved in a disease. We have developed a web front end (available at http://www.iupui.edu/~cytosnp) to disseminate this information for immunologists, biomedical researchers, and other interested biological researchers. Since there is no such comprehensive collection of disease associated cytokine SNPs, this DB will be vital to understanding the role of cytokine SNPs as markers in disease, and, more importantly, in causality to disease thus helping to identify drug targets for common inflammatory diseases. Due to the presence of rich annotations, the DACS-DB can be a good source for building a tool for the prediction of the "disease association potential (DAP)" of a given SNP. In a preliminary effort to devise such a methodology for DAP prediction, we have applied a support vector machine (SVM) to classify SNPs. Employing the SNP attributes of function class, heterozygosity value, and heterozygosity standard error, 864 SNPs were classified into two classes, "disease" and "non-disease". The SVM returned a classification of these SNPs into the disease and non-disease classes with an accuracy of 74%. By modifying various SNP and disease attributes in the training data sets, such a predictive algorithm can be extrapolated to identify potential disease associated SNPs among newly sequenced cytokine variations. In the long run, this approach can provide a means for future gene variation based therapeutic regimens.Item An exploratory study using the predicate-argument structure to develop methodology for measuring semantic similarity of radiology sentences(2013-11-12) Newsom, Eric Tyner; Jones, Josette F.; Gamache, Roland E.; Mahoui, MalikaThe amount of information produced in the form of electronic free text in healthcare is increasing to levels incapable of being processed by humans for advancement of his/her professional practice. Information extraction (IE) is a sub-field of natural language processing with the goal of data reduction of unstructured free text. Pertinent to IE is an annotated corpus that frames how IE methods should create a logical expression necessary for processing meaning of text. Most annotation approaches seek to maximize meaning and knowledge by chunking sentences into phrases and mapping these phrases to a knowledge source to create a logical expression. However, these studies consistently have problems addressing semantics and none have addressed the issue of semantic similarity (or synonymy) to achieve data reduction. To achieve data reduction, a successful methodology for data reduction is dependent on a framework that can represent currently popular phrasal methods of IE but also fully represent the sentence. This study explores and reports on the benefits, problems, and requirements to using the predicate-argument statement (PAS) as the framework. A convenient sample from a prior study with ten synsets of 100 unique sentences from radiology reports deemed by domain experts to mean the same thing will be the text from which PAS structures are formed.Item EXPLORING HEALTH WEBSITE USERS BY WEB MINING(Universal Access in Human-Computer Interaction. Applications and Services Lecture Notes in Computer Science, 2011, Volume 6768/2011, 376-383, DOI: 10.1007/978-3-642-21657-2_40, 2011-07) Kong, Wei; Jones, Josette F.; Mahoui, Malika; Kharrazi, HadiWith the continuous growth of health information on the Internet, providing user-orientated health service online has become a great challenge to health providers. Understanding the information needs of the users is the first step to providing tailored health service. The purpose of this study is to examine the navigation behavior of different user groups by extracting their search terms and to make some suggestions to reconstruct a website for more customized Web service. This study analyzed five months’ of daily access weblog files from one local health provider’s website, discovered the most popular general topics and health related topics, and compared the information search strategies for both patient/consumer and doctor groups. Our findings show that users are not searching health information as much as was thought. The top two health topics which patients are concerned about are children’s health and occupational health. Another topic that both user groups are interested in is medical records. Also, patients and doctors have different search strategies when looking for information on this website. Patients get back to the previous page more often, while doctors usually go to the final page directly and then leave the page without coming back. As a result, some suggestions to redesign and improve the website are discussed; a more intuitive portal and more customized links for both user groups are suggested.Item An Improved Utility Driven Approach Towards K-Anonymity Using Data Constraint Rules(2013-08-14) Morton, Stuart Michael; Mahoui, Malika; Palakal, Mathew J.; Gibson, P. Joseph; Kharrazi, HadiAs medical data continues to transition to electronic formats, opportunities arise for researchers to use this microdata to discover patterns and increase knowledge that can improve patient care. Now more than ever, it is critical to protect the identities of the patients contained in these databases. Even after removing obvious “identifier” attributes, such as social security numbers or first and last names, that clearly identify a specific person, it is possible to join “quasi-identifier” attributes from two or more publicly available databases to identify individuals. K-anonymity is an approach that has been used to ensure that no one individual can be distinguished within a group of at least k individuals. However, the majority of the proposed approaches implementing k-anonymity have focused on improving the efficiency of algorithms implementing k-anonymity; less emphasis has been put towards ensuring the “utility” of anonymized data from a researchers’ perspective. We propose a new data utility measurement, called the research value (RV), which extends existing utility measurements by employing data constraints rules that are designed to improve the effectiveness of queries against the anonymized data. To anonymize a given raw dataset, two algorithms are proposed that use predefined generalizations provided by the data content expert and their corresponding research values to assess an attribute’s data utility as it is generalizing the data to ensure k-anonymity. In addition, an automated algorithm is presented that uses clustering and the RV to anonymize the dataset. All of the proposed algorithms scale efficiently when the number of attributes in a dataset is large.Item Large Scale Semantic Annotation of Radiology Reports(Office of the Vice Chancellor for Research, 2010-04-09) Mahoui, Malika; Kashyap, Vinay; Jamieson, Patrick; Jones, Josette; Friedlin, JeffreyThe development and testing of automated information extraction (IE) systems depends on semantically annotated free text. This presentation reports on the results of a large scale annotation project of a radiology corpus, the Roentgen corpus, consisting of 594,000 deidentified radiology reports with 36 million words, and 4.3 million sentences supplied by Indiana University. The presentation highlights the (1) sentence-based approach in defining propositions annotating the corpus, (2) as well as the annotation framework that is incrementally built and refined in order to facilitate the process of annotation.Item Machine learning to predict coronary artery disease using proteomics biomarkers(Office of the Vice Chancellor for Research, 2010-04-09) Mahoui, MalikaCoronary artery disease (CAD) is the leading cause of morbidity and mortality in the United States and is greatly exacerbated by metabolic syndrome (MetS). Current techniques to diagnose CAD are invasive, expensive, and their appropriateness varies among physicians. Therefore, they cannot be used as a routine screening test to predict CAD. To assess the severity of the CAD disease, the diagnostic tests determine with various degrees of accuracy the percentage level of phenotypes such as atheroma wall coverage, stenosis, and plaque composition in the coronary arteries. These phenotypes are measured using invasive methods such as IVUS; in comparison to other phenotypes such as the insulin level that do not require such invasive methods, but at the same time, these phenotypes are less accurate for diagnosis purposes. In addition to predicting the CAD disease, there is a need to improve early screening of the disease without having to use invasive methods such as IVUS. The objective of the study described in this poster is to develop an accurate and non-invasive informatics approach to facilitate screening and monitoring of patients with CAD using a combination of plasma proteomics data and the non-invasively generated phenotypes (e.g. insulin level). This study concentrates on using machine-learning approach to predict ranges of values (e.g. low, moderate, high percentage) for the invasively generated phenotypes, with a special focus on atheroma wall coverage. The ranges of values are mapped to different stages in the CAD disease.Item Medication Adherence Prediction Through Online Social Forums: A Case Study of Fibromyalgia(JMIR, 2019) Haas, Kyle; Ben Miled, Zina; Mahoui, Malika; Electrical and Computer Engineering, School of Engineering and TechnologyBackground: Medication nonadherence can compound into severe medical problems for patients. Identifying patients who are likely to become nonadherent may help reduce these problems. Data-driven machine learning models can predict medication adherence by using selected indicators from patients’ past health records. Sources of data for these models traditionally fall under two main categories: (1) proprietary data from insurance claims, pharmacy prescriptions, or electronic medical records and (2) survey data collected from representative groups of patients. Models developed using these data sources often are limited because they are proprietary, subject to high cost, have limited scalability, or lack timely accessibility. These limitations suggest that social health forums might be an alternate source of data for adherence prediction. Indeed, these data are accessible, affordable, timely, and available at scale. However, they can be inaccurate. Objective: This paper proposes a medication adherence machine learning model for fibromyalgia therapies that can mitigate the inaccuracy of social health forum data. Methods: Transfer learning is a machine learning technique that allows knowledge acquired from one dataset to be transferred to another dataset. In this study, predictive adherence models for the target disease were first developed by using accurate but limited survey data. These models were then used to predict medication adherence from health social forum data. Random forest, an ensemble machine learning technique, was used to develop the predictive models. This transfer learning methodology is demonstrated in this study by examining data from the Medical Expenditure Panel Survey and the PatientsLikeMe social health forum. Results: When the models are carefully designed, less than a 5% difference in accuracy is observed between the Medical Expenditure Panel Survey and the PatientsLikeMe medication adherence predictions for fibromyalgia treatments. This design must take into consideration the mapping between the predictors and the outcomes in the two datasets. Conclusions: This study exemplifies the potential and limitations of transfer learning in medication adherence–predictive models based on survey data and social health forum data. The proposed approach can make timely medication adherence monitoring cost-effective and widely accessible. Additional investigation is needed to improve the robustness of the approach and extend its applicability to other therapies and other sources of data. [JMIR Med Inform 2019;7(2):e12561]Item Prediction by Partial Matching for Identification of Biological EntitiesThirumalaiswamy Sekhar, Arvind Kumar; Mahoui, MalikaAs biomedical research and advances in biotechnology generate expansive datasets, the need to process this data into information has grown simultaneously. Specifically, recognizing and extracting these “key” phrases comprising the named entities from this information databank promises a plethora of applications for scientists. The ability to construct interaction maps,identify proteins as drug targets are two important applications. Since we have the choice of defining what is “useful”, we can potentially utilize text mining for our purpose. In a novel attempt to beat the challenge, we have put information theory and text compression through this task. Prediction by partial matching is an adaptive text encoding scheme that blends together a set of finite context Markov models to predict the probability of the next token in a given symbol stream. We observe, named entities such as gene names, protein names, gene functions, protein-protein interactions – all follow symbol statistics uniquely different from normal scientific text. By using well defined training sets that allow us to selectively differentiate between named entities and the rest of the symbols; we were able to extract them with a good accuracy. We have implemented our tests, using the Text Mining Toolkit, on identification of gene functions and protein-protein interactions with f-scores (based on precision & recall) of 0.9737 and 0.6865 respectively. With our results, we foresee the application of such an approach in automated information retrieval in the realm of biology.