- Browse by Subject
Browsing by Subject "Literature mining"
Now showing 1 - 2 of 2
Results Per Page
Sort Options
Item Computational biology approaches in drug repurposing and gene essentiality screening(2016-06-20) Philips, Santosh; Li, Lang; Liu, Yunlong; Liu, Xiaowen; Skaar, Todd C.; Janga, Sarath C.The rapid innovations in biotechnology have led to an exponential growth of data and electronically accessible scientific literature. In this enormous scientific data, knowledge can be exploited, and novel discoveries can be made. In my dissertation, I have focused on the novel molecular mechanism and therapeutic discoveries from big data for complex diseases. It is very evident today that complex diseases have many factors including genetics and environmental effects. The discovery of these factors is challenging and critical in personalized medicine. The increasing cost and time to develop new drugs poses a new challenge in effectively treating complex diseases. In this dissertation, we want to demonstrate that the use of existing data and literature as a potential resource for discovering novel therapies and in repositioning existing drugs. The key to identifying novel knowledge is in integrating information from decades of research across the different scientific disciplines to uncover interactions that are not explicitly stated. This puts critical information at the fingertips of researchers and clinicians who can take advantage of this newly acquired knowledge to make informed decisions. This dissertation utilizes computational biology methods to identify and integrate existing scientific data and literature resources in the discovery of novel molecular targets and drugs that can be repurposed. In chapters 1 of my dissertation, I extensively sifted through scientific literature and identified a novel interaction between Vitamin A and CYP19A1 that could lead to a potential increase in the production of estrogens. Further in chapter 2 by exploring a microarray dataset from an estradiol gene sensitivity study I was able to identify a potential novel anti-estrogenic indication for the commonly used urinary analgesic, phenazopyridine. Both discoveries were experimentally validated in the laboratory. In chapter 3 of my dissertation, through the use of a manually curated corpus and machine learning algorithms, I identified and extracted genes that are essential for cell survival. These results brighten the reality that novel knowledge with potential clinical applications can be discovered from existing data and literature by integrating information across various scientific disciplines.Item Using machine learning algorithms to identify genes essential for cell survival(BMC, 2017-10-03) Philips, Santosh; Wu, Heng-Yi; Li, Lang; Medical and Molecular Genetics, School of MedicineBackground With the explosion of data comes a proportional opportunity to identify novel knowledge with the potential for application in targeted therapies. In spite of this huge amounts of data, the solutions to treating complex disease is elusive. One reason being that these diseases are driven by a network of genes that need to be targeted in order to understand and treat them effectively. Part of the solution lies in mining and integrating information from various disciplines. Here we propose a machine learning method to mining through publicly available literature on RNA interference with the goal of identifying genes essential for cell survival. Results A total of 32,164 RNA interference abstracts were identified from 10.5 million pubmed abstracts (2001 - 2015). These abstracts spanned over 1467 cancer cell lines and 4373 genes representing a total of 25,891 cell gene associations. Among the 1467 cell lines 88% of them had at least 1 or up to 25 genes studied in a given cell line. Among the 4373 genes 96% of them were studied in at least 1 or up to 25 different cell lines. Conclusions Identifying genes that are crucial for cell survival can be a critical piece of information especially in treating complex diseases, such as cancer. The efficacy of a therapeutic intervention is multifactorial in nature and in many cases the source of therapeutic disruption could be from an unsuspected source. Machine learning algorithms helps to narrow down the search and provides information about essential genes in different cancer types. It also provides the building blocks to generate a network of interconnected genes and processes. The information thus gained can be used to generate hypothesis which can be experimentally validated to improve our understanding of what triggers and maintains the growth of cancerous cells. Electronic supplementary material The online version of this article (10.1186/s12859-017-1799-1) contains supplementary material, which is available to authorized users.