- Browse by Subject
Browsing by Subject "bioinformatics"
Now showing 1 - 10 of 14
Results Per Page
Sort Options
Item A Computational Framework for Investigating mRNA Localization Patterns in Pancreatic Beta-Cells During Type 1 Diabetes Progression(2024-12) Chang, Hok Wai; Petrache, Horia; Liu, Jing; Wassall, Stephen; Vemuri, Gautam; Syed, FarooqSpatial transcriptomics improves transcriptomic studies by incorporating RNA localization information, which provides a more profound insight into cellular functions, interactions between cells, and their reactions to external stimuli. Single-molecule fluorescent in situ hybridization (smFISH) is a commonly utilized technique in spatial transcriptomics that allows for the accurate visualization of mRNA distribution in cells. This method aids in the quantitative evaluation of mRNA localization patterns by utilizing various physical properties, thereby illuminating processes such as transcription, nuclear export, and localized translation. Nevertheless, existing computational approaches for analyzing smFISH images often have constraints, concentrating primarily on cellular expression or specific biological contexts while overlooking broader physical analysis. In my PhD research, I created STProfiler, a comprehensive tool aimed at an unbiased physical examination of mRNA distribution. STProfiler includes an image analysis workflow that processes raw biological images to effectively detect mRNA and nuclei. It also employs machine learning techniques to biologically interpret mRNA spatial characteristics and categorize cells based on these features. My dissertation illustrates the use of STProfiler in multiple studies investigating the transcriptomic profiles of β-cells during the progression of type 1 diabetes (T1D), uncovering spatial transcriptomic diversity in β-cells. These investigations involve analyzing mRNA clusters and stress granules in pancreatic β-cells, measuring the physical characteristics of mRNAs linked to cellular stress and inflammation in mice developing T1D, evaluating the rise in HLA-DMB mRNA spliced variant in T1D, and exploring miRNA as a potential biomarker for T1D. Furthermore, STProfiler has also proven beneficial in tissue-wide spatial transcriptomics by creating masks for nuclei and cells from biological images and assigning mRNA transcripts to develop subcellular expression profiles. This capability allows for more thorough bioinformatic evaluations. In summary, STProfiler serves as a robust tool for both cell- and tissue-level spatial transcriptomics, offering an unbiased platform for researchers to investigate complex transcriptomic variations within cells.Item Application of Data Pipelining Technology in Cheminformatics and Bioinformatics(2002-12) Mao, Linyong; Perry, Douglas G.Data pipelining is the processing, analysis, and mining of large volumes of data through a branching network of computational steps. A data pipelining system consists of a collection of modular computational components and a network for streaming data between them. By defining a logical path for data through a network of computational components and configuring each component accordingly, a user can create a protocol to perform virtually any desired function with data and extract knowledge from them. A set of data pipelines were constructed to explore the relationship between the biodegradability and structural properties of halogenated aliphatic compounds in a data set in which each compound has one degradation rate and nine structure-derived properties. After training, the data pipeline was able to calculate the degradation rates of new compounds with a relatively accurate rate. A second set of data pipelines was generated to cluster new DNA sequences. The data pipelining technology was applied to identify a core sequence to represent a DNA cluster and construct the 95% confidence distance interval for the cluster. The result shows that 74% of the DNA sequences were correctly clustered and there was no false clustering.Item Bioinformatics and Pharmacogenomics in Drug Discovery and DevelopmentAnyanwu, Chukwuma Eustace; Jones, JosetteObjective: Literature review to evaluate the extent to which Bioinformatics has facilitated the drug discovery and development process from an economic perspective Problem: A plethora of genomic and proteomic information was uncovered by the U.S Human Genome Project (HGP). Despite the projected impact that Bioinformatics and Pharmacogenomics were projected to have in the drug discovery and development process, the challenges facing the pharmaceutical companies – in this regard, still persist. Design: An extensive integrated literature review of library resources such as MEDLINE, ERIC, PsychInfo, EconLit, Social Services Abstracts, ABI/INFORM and LISA (all 1990 – Present). These electronic databases were researched because of their focuses on the healthcare sector, medical and scientific innovations, economic modeling and analysis, bioinformatics and computational biology, applied social research and technology applications. Semi-structured interviews of Bioinformatics professionals were also conducted to complement the literature review. Also, Internet-based databases from reliable resources were also researched resulting in serendipitous discoveries. Sample: Published English language reports of studies and research carried out worldwide from 1990 to 2004, relating to drug discovery and development. Selection criteria: Primary focus was on research publications and journals that identify and discuss the practice of Bioinformatics, especially in the area of drug discovery and development. Premium was placed on articles and publications that discussed the economic impacts of Bioinformatics in the drug discovery process. Results: Though the goals of Bioinformatics have been clearly defined, and the discipline is widely practiced in the pharmaceutical industry, this study has not found any definite attempts to evaluate its economic and regulatory impact specifically in facilitating the drug discovery and development process, and the delivery of personalized drugs. Discussion: Bioinformatics and Pharmacogenomics are the new facets of the ever-evolving drug discovery and development process. It may still be a while before their full impact and potential is attained.Item Bioinformatics and Pharmacogenomics in Drug Discovery and Development- a Socio-economic Perspective(2006-07-26T14:37:43Z) Anyanwu, Chukwuma Eustace; Jones, JosetteA plethora of genomic and proteomic information was uncovered by the U.S Human Genome Project (HGP) – mostly by means of bioinformatics tools and techniques. Despite the impact that bioinformatics and pharmacogenomics were projected to have in the drug discovery and development process, the challenges facing the pharmaceutical industry, such as the high cost and the slow pace of drug development, appear to persist. Socio-economic barriers exist that mitigate the full integration of bioinformatics and pharmacogenomics into the drug discovery and development process, hence limiting the desired and expected effects.Item A Biological and Bioinformatics Ontology for Service Discovery and Data Integration(2006-07-26T15:44:50Z) Dippold, Mindi M.; Mahoui, MalikaThis project addresses the need for an increased expressivity and robustness of ontologies already supporting BACIIS and SIBIOS, two systems for data and service integration in the life sciences. The previous ontology solutions as global schema and facilitator of service discovery sustained the purposes for which they were built to provide, but were in need of updating in order to keep up with more recent standards in ontology descriptions and utilization as well as increase the breadth of the domain and expressivity of the content. Thus, several tasks were undertaken to increase the worth of the system ontologies. These include an upgrade to a more recent ontology language standard, increased domain coverage, and increased expressivity via additions of relationships and hierarchies within the ontology as well as increased ease of maintenance by a distributed design.Item Comparison of Multi-Sample Variant Calling Methods for Whole Genome Sequencing(Institute of Electrical and Electronics Engineers, 2014-10) Nho, Kwangsik; West, John D.; Li, Huian; Henschel, Robert; Bharthur, Apoorva; Tavares, Michel C.; Saykin, Andrew J.; Department of Medicine, IU School of MedicineRapid advancement of next-generation sequencing (NGS) technologies has facilitated the search for genetic susceptibility factors that influence disease risk in the field of human genetics. In particular whole genome sequencing (WGS) has been used to obtain the most comprehensive genetic variation of an individual and perform detailed evaluation of all genetic variation. To this end, sophisticated methods to accurately call high-quality variants and genotypes simultaneously on a cohort of individuals from raw sequence data are required. On chromosome 22 of 818 WGS data from the Alzheimer's Disease Neuroimaging Initiative (ADNI), which is the largest WGS related to a single disease, we compared two multi-sample variant calling methods for the detection of single nucleotide variants (SNVs) and short insertions and deletions (indels) in WGS: (1) reduce the analysis-ready reads (BAM) file to a manageable size by keeping only essential information for variant calling ("REDUCE") and (2) call variants individually on each sample and then perform a joint genotyping analysis of the variant files produced for all samples in a cohort ("JOINT"). JOINT identified 515,210 SNVs and 60,042 indels, while REDUCE identified 358,303 SNVs and 52,855 indels. JOINT identified many more SNVs and indels compared to REDUCE. Both methods had concordance rate of 99.60% for SNVs and 99.06% for indels. For SNVs, evaluation with HumanOmni 2.5M genotyping arrays revealed a concordance rate of 99.68% for JOINT and 99.50% for REDUCE. REDUCE needed more computational time and memory compared to JOINT. Our findings indicate that the multi-sample variant calling method using the JOINT process is a promising strategy for the variant detection, which should facilitate our understanding of the underlying pathogenesis of human diseases.Item Computational Analysis of Flow Cytometry Data(2013-07-12) Irvine, Allison W.; Dundar, Murat; Tuceryan, Mihran; Mukhopadhyay, Snehasis; Fang, ShiaofenThe objective of this thesis is to compare automated methods for performing analysis of flow cytometry data. Flow cytometry is an important and efficient tool for analyzing the characteristics of cells. It is used in several fields, including immunology, pathology, marine biology, and molecular biology. Flow cytometry measures light scatter from cells and fluorescent emission from dyes which are attached to cells. There are two main tasks that must be performed. The first is the adjustment of measured fluorescence from the cells to correct for the overlap of the spectra of the fluorescent markers used to characterize a cell’s chemical characteristics. The second is to use the amount of markers present in each cell to identify its phenotype. Several methods are compared to perform these tasks. The Unconstrained Least Squares, Orthogonal Subspace Projection, Fully Constrained Least Squares and Fully Constrained One Norm methods are used to perform compensation and compared. The fully constrained least squares method of compensation gives the overall best results in terms of accuracy and running time. Spectral Clustering, Gaussian Mixture Modeling, Naive Bayes classification, Support Vector Machine and Expectation Maximization using a gaussian mixture model are used to classify cells based on the amounts of dyes present in each cell. The generative models created by the Naive Bayes and Gaussian mixture modeling methods performed classification of cells most accurately. These supervised methods may be the most useful when online classification is necessary, such as in cell sorting applications of flow cytometers. Unsupervised methods may be used to completely replace manual analysis when no training data is given. Expectation Maximization combined with a cluster merging post-processing step gives the best results of the unsupervised methods considered.Item Computational integration and meta-analysis of abandoned cardio-(vascular/renal/metabolic) therapeutics discontinued during clinical trials from 2011 to 2022(Frontiers, 2023-02) Zeng, Carisa; Lee, Yoon Seo; Szatrowski, Austin; Mero, Deniel; Khomtchouk, Bohdan B.; Biohealth Informatics, School of Informatics and ComputingCardiovascular/renal/metabolic (CVRM) diseases collectively comprise the leading cause of death worldwide and disproportionally affect older demographics and historically underrepresented minority populations. Despite these critical unmet needs, pharmaceutical research and development (R&D) efforts have historically struggled with high drug failure rates, low approval rates, and other challenges. Drug repurposing is one approach to recovering R&D costs and meeting unmet demands in therapeutic markets. While there are multiple approaches to conducting drug repurposing, we recognize the importance of bringing together and consolidating discontinued drug information to help identify prospective repurposing candidates. In this study, we have harmonized and integrated information on all relevant CVRM drug assets from U.S. Securities and Exchange Commission (SEC) filings, clinical trial records, PharmGKB, Open Targets, and other platforms. A list of existing therapeutics discontinued or shelved by pharmaceutical/biotechnology companies in 2011-2022 were manually curated and interpreted for insights using information on each drug's genetic target, mechanism of action (MOA), clinical indication, and R&D information including highest phase of clinical development, year of discontinuation, previous repurposing attempts (if any), and other actionable metadata. This study also summarizes the profiles of CVRM drugs discontinued within the past decade and identifies the limitations of publicly available information on discontinued drug assets. The constructed database could serve as a tool for identifying candidates for drug repurposing and developing query methods for collecting R&D information.Item Drug Selection via Joint Push and Learning to Rank(IEEE, 2018-06) He, Yicheng; Liu, Junfeng; Ning, Xia; Medical and Molecular Genetics, School of MedicineSelecting the right drugs for the right patients is a primary goal of precision medicine. In this manuscript, we consider the problem of cancer drug selection in a learning-to-rank framework. We have formulated the cancer drug selection problem as to accurately predicting 1). the ranking positions of sensitive drugs and 2). the ranking orders among sensitive drugs in cancer cell lines based on their responses to cancer drugs. We have developed a new learning-to-rank method, denoted as pLETORg, that predicts drug ranking structures in each cell line via using drug latent vectors and cell line latent vectors. The pLETORg method learns such latent vectors through explicitly enforcing that, in the drug ranking list of each cell line, the sensitive drugs are pushed above insensitive drugs, and meanwhile the ranking orders among sensitive drugs are correct. Genomics information on cell lines is leveraged in learning the latent vectors. Our experimental results on a benchmark cell line-drug response dataset demonstrate that the new pLETORg significantly outperforms the state-of-the-art method in prioritizing new sensitive drugs.Item LymphTF Database- A Database of Transcription Factor Activity in Lymphocyte Development(2006-07-26T15:52:21Z) Childress, PaulStudy of the transcriptional regulation of lymphocyte development has advanced greatly in the past 15 years. Owing to improved techniques and intense interest in the topic, a great many interactions between transcription factors and their target genes have been described. For these B and T cells, a more clear picture is beginning to emerge of how they start with a common progenitor cell, and progressively restrict potential to give many different types of terminally differentiated cells. As B and T cells develop they both follow a roughly similar path that involves early stepwise progression to later stages where multiple developmental options are available. To progress in the developmental regime they share requirements for proper anatomical location and successful rearrangements of the germ line DNA to give the plethora of antibodies and T cell receptors seen in the immune system. Because the amount of information is quickly becoming more than can be assimilated by researchers, a knowledge gap has opened between what is known about the transcription factor activities during this process and what any one individual can recall. To help fill this gap, we have created the LymphTF Database. This database holds interactions between individual transcription factors and their specific targets at a given developmental time. It is our hope that storing the interactions in developmental time will allow for elucidation of regulatory networks which guide the process. Work for this project also included construction of a custom data entry web page that automates many tasks associated with populating the database tables. These tables have also been related in multiple ways to allow for storage of incomplete information on transcription factor activity. This is done without having to replace existing records as details become available. The LymphTF DB is a relational MySQL database which can be accessed freely on the web at http://www.iupui.edu/~tfinterx/.