Sarath Janga

Permanent URI for this collection

Rapid Detection of Viral Based Infectious Diseases

The emergence of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) in Wuhan (Hubei, China) in December 2019 has been declared a pandemic by the World Health Organization (WHO) due to its easy human to human transmission, making it a global public health concern. Coronaviruses are enveloped single-stranded ribonucleic acid (RNA) viruses with characteristic “crown” like appearance under two-dimensional transmission electron microscopy. Infections caused by these viruses result in severe pneumonia, fever and breathing difficulty. Currently there is a lack of effective vaccines and antiviral medication that has led to a global outbreak of SARS-CoV-2. Due to rapidly evolving nature of coronaviruses, their identification has become increasingly challenging. Therefore, it is important to develop diagnostic methods that can detect the virus rapidly, to prevent its transmission. Currently, most clinical diagnostic tests for viruses depend on detecting a viral antigen or rely on PCR amplification of viral nucleic acid derived from biological samples. These two approaches offer trade-offs in benefits: antigen tests (including current Point-Of-Care Tests [POCT]) are typically rapid but have low sensitivity, while PCR is more time-consuming but also more sensitive. Irrespective of the test used, most clinical diagnostic facilities report a non-quantitative (binary) diagnostic result, and the data generated have limited capacity to inform insights into epidemiological linkage, vaccine efficacy, or antiviral susceptibility. Hence, there is an urgent need to generate new diagnostic tests that combine POCT, speed, sensitivity, detection of coinfection by other viral strains, and generation of quantitative or semi-quantitative data that can be used to identify drug resistance. Such data may also be used to reconstruct phylogeny to inform surveillance, public health strategy, and vaccine design.

Dr. Sarath Janga's lab has been working to employ “third-generation” portable, real-time bench top sequencers which use nanopores, to develop novel experimental protocols and computational algorithms to not only detect the presence of pathogens but also map their variability across clinical samples, to facilitate public health surveillance. More recently, his lab has been combining an efficient, novel and high-throughput viral RNA isolation methods accompanied with nanopore sequencing to develop automated computational software for real time detection of COVID19.

Dr. Janga's research to detect COVID19 virus strains for developing a rapid, real-time and scalable test that can be used in the clinics to help the healthcare workers, who are at the front lines of care and are getting exposed to infections, is another example of how IUPUI's faculty are TRANSLATING their RESEARCH INTO PRACTICE.

Browse

Recent Submissions

Now showing 1 - 10 of 115
  • Item
    Sequoia: an interactive visual analytics platform for interpretation and feature extraction from nanopore sequencing datasets
    (BMC, 2021-07-07) Koonchanok, Ratanond; Daulatabad, Swapna Vidhur; Mir, Quoseena; Reda, Khairi; Janga, Sarath Chandra; Human-Centered Computing, School of Informatics and Computing
    Background: Direct-sequencing technologies, such as Oxford Nanopore's, are delivering long RNA reads with great efficacy and convenience. These technologies afford an ability to detect post-transcriptional modifications at a single-molecule resolution, promising new insights into the functional roles of RNA. However, realizing this potential requires new tools to analyze and explore this type of data. Result: Here, we present Sequoia, a visual analytics tool that allows users to interactively explore nanopore sequences. Sequoia combines a Python-based backend with a multi-view visualization interface, enabling users to import raw nanopore sequencing data in a Fast5 format, cluster sequences based on electric-current similarities, and drill-down onto signals to identify properties of interest. We demonstrate the application of Sequoia by generating and analyzing ~ 500k reads from direct RNA sequencing data of human HeLa cell line. We focus on comparing signal features from m6A and m5C RNA modifications as the first step towards building automated classifiers. We show how, through iterative visual exploration and tuning of dimensionality reduction parameters, we can separate modified RNA sequences from their unmodified counterparts. We also document new, qualitative signal signatures that characterize these modifications from otherwise normal RNA bases, which we were able to discover from the visualization. Conclusions: Sequoia's interactive features complement existing computational approaches in nanopore-based RNA workflows. The insights gleaned through visual analysis should help users in developing rationales, hypotheses, and insights into the dynamic nature of RNA. Sequoia is available at https://github.com/dnonatar/Sequoia .
  • Item
    SliceIt: A genome-wide resource and visualization tool to design CRISPR/Cas9 screens for editing protein-RNA interaction sites in the human genome
    (Elsevier, 2020-06) Vemuri, Sasank; Srivastava, Rajneesh; Mir, Quoseena; Hashemikhabir, Seyedsasan; Dong, X. Charlie; Janga, Sarath Chandra; BioHealth Informatics, School of Informatics and Computing
    Several protein-RNA cross linking protocols have been established in recent years to delineate the molecular interaction of an RNA Binding Protein (RBP) and its target RNAs. However, functional dissection of the role of the RBP binding sites in modulating the post-transcriptional fate of the target RNA remains challenging. CRISPR/Cas9 genome editing system is being commonly employed to perturb both coding and noncoding regions in the genome. With the advancements in genome-scale CRISPR/Cas9 screens, it is now possible to not only perturb specific binding sites but also probe the global impact of protein-RNA interaction sites across cell types. Here, we present SliceIt (http://sliceit.soic.iupui.edu/), a database of in silico sgRNA (single guide RNA) library to facilitate conducting such high throughput screens. SliceIt comprises of ~4.8 million unique sgRNAs with an estimated range of 2-8 sgRNAs designed per RBP binding site, for eCLIP experiments of >100 RBPs in HepG2 and K562 cell lines from the ENCODE project. SliceIt provides a user friendly environment, developed using advanced search engine framework, Elasticsearch. It is available in both table and genome browser views facilitating the easy navigation of RBP binding sites, designed sgRNAs, exon expression levels across 53 human tissues along with prevalence of SNPs and GWAS hits on binding sites. Exon expression profiles enable examination of locus specific changes proximal to the binding sites. Users can also upload custom tracks of various file formats directly onto genome browser, to navigate additional genomic features in the genome and compare with other types of omics profiles. All the binding site-centric information is dynamically accessible via "search by gene", "search by coordinates" and "search by RBP" options and readily available to download. Validation of the sgRNA library in SliceIt was performed by selecting RBP binding sites in Lipt1 gene and designing sgRNAs. Effect of CRISPR/Cas9 perturbations on the selected binding sites in HepG2 cell line, was confirmed based on altered proximal exon expression levels using qPCR, further supporting the utility of the resource to design experiments for perturbing protein-RNA interaction networks. Thus, SliceIt provides a one-stop repertoire of guide RNA library to perturb RBP binding sites, along with several layers of functional information to design both low and high throughput CRISPR/Cas9 screens, for studying the phenotypes and diseases associated with RBP binding sites.
  • Item
    Lantern: an integrative repository of functional annotations for lncRNAs in the human genome
    (BMC, 2021-05-26) Daulatabad, Swapna Vidhur; Srivastava, Rajneesh; Janga, Sarath Chandra; BioHealth Informatics, School of Informatics and Computing
    Background: With advancements in omics technologies, the range of biological processes where long non-coding RNAs (lncRNAs) are involved, is expanding extensively, thereby generating the need to develop lncRNA annotation resources. Although, there are a plethora of resources for annotating genes, despite the extensive corpus of lncRNA literature, the available resources with lncRNA ontology annotations are rare. Results: We present a lncRNA annotation extractor and repository (Lantern), developed using PubMed's abstract retrieval engine and NCBO's recommender annotation system. Lantern's annotations were benchmarked against lncRNAdb's manually curated free text. Benchmarking analysis suggested that Lantern has a recall of 0.62 against lncRNAdb for 182 lncRNAs and precision of 0.8. Additionally, we also annotated lncRNAs with multiple omics annotations, including predicted cis-regulatory TFs, interactions with RBPs, tissue-specific expression profiles, protein co-expression networks, coding potential, sub-cellular localization, and SNPs for ~ 11,000 lncRNAs in the human genome, providing a one-stop dynamic visualization platform. Conclusions: Lantern integrates a novel, accurate semi-automatic ontology annotation engine derived annotations combined with a variety of multi-omics annotations for lncRNAs, to provide a central web resource for dissecting the functional dynamics of long non-coding RNAs and to facilitate future hypothesis-driven experiments. The annotation pipeline and a web resource with current annotations for human lncRNAs are freely available on sysbio.lab.iupui.edu/lantern.
  • Item
    Clinical Features Distinguishing Diabetic Retinopathy Severity Using Artificial Intelligence
    (2022-07-29) Happe, Michael; Gill, Hunter; Salem, Doaa Hassan; Janga, Sarath Chandra; Hajrasouliha, Amir
    BACKGROUND AND HYPOTHESIS: 1 in 29 American diabetics suffer from diabetic retinopathy (DR), the weakening of blood vessels in the retina. DR goes undetected in nearly 50% of diabetics, allowing DR to steal the vision of many Americans. We hypothesize that increasing the rate and ease of diagnosing DR by introducing artificial intelligence-based methods in primary medical clinics will increase the long-term preservation of ocular health in diabetic patients. PROJECT METHODS: This retrospective cohort study was conducted under approval from the Institutional Review Board of Indiana University School of Medicine. Images were deidentified and no consent was taken due to the nature of this retrospective study. We categorized 676 patient files based upon HbA1c, severity of non-proliferative diabetic retinopathy (NPDR), and proliferative diabetic retinopathy (PDR). Retinal images were annotated to identify common features of DR: microaneurysms, hemorrhages, cotton wool spots, exudates, and neovascularization. The VGG Image Annotator application used for annotations allowed us to save structure coordinates into a separate database for future training of the artificial intelligence system. RESULTS: 228 (33.7%) of patients were diagnosed with diabetes, and 143 (62.7%) of those were diagnosed with DR. Two-sample t tests found significant differences between the HbA1c values of all diabetics compared to diabetics without retinopathy (p<0.007) and between all severities of DR versus diabetics without retinopathy (p<0.002). 283 eyes were diagnosed with a form of DR in this study: 37 mild NPDR, 42 moderate NPDR, 56 severe NPDR, and 148 PDR eyes. POTENTIAL IMPACT: With the dataset of coordinates and HbA1c values from this experiment, we aim to train an artificial intelligence system to diagnose DR through retinal imaging. The goal of this system is to be conveniently used in primary medical clinics to increase the detection rate of DR to preserve the ocular health of millions of future Americans.
  • Item
    Transcriptome-wide high-throughput mapping of protein–RNA occupancy profiles using POP-seq
    (Springer Nature, 2021-01-13) Srivastava, Mansi; Srivastava, Rajneesh; Janga, Sarath Chandra; BioHealth Informatics, School of Informatics and Computing
    Interaction between proteins and RNA is critical for post-transcriptional regulatory processes. Existing high throughput methods based on crosslinking of the protein–RNA complexes and poly-A pull down are reported to contribute to biases and are not readily amenable for identifying interaction sites on non poly-A RNAs. We present Protein Occupancy Profile-Sequencing (POP-seq), a phase separation based method in three versions, one of which does not require crosslinking, thus providing unbiased protein occupancy profiles on whole cell transcriptome without the requirement of poly-A pulldown. Our study demonstrates that ~ 68% of the total POP-seq peaks exhibited an overlap with publicly available protein–RNA interaction profiles of 97 RNA binding proteins (RBPs) in K562 cells. We show that POP-seq variants consistently capture protein–RNA interaction sites across a broad range of genes including on transcripts encoding for transcription factors (TFs), RNA-Binding Proteins (RBPs) and long non-coding RNAs (lncRNAs). POP-seq identified peaks exhibited a significant enrichment (p value < 2.2e−16) for GWAS SNPs, phenotypic, clinically relevant germline as well as somatic variants reported in cancer genomes, suggesting the prevalence of uncharacterized genomic variation in protein occupied sites on RNA. We demonstrate that the abundance of POP-seq peaks increases with an increase in expression of lncRNAs, suggesting that highly expressed lncRNA are likely to act as sponges for RBPs, contributing to the rewiring of protein–RNA interaction network in cancer cells. Overall, our data supports POP-seq as a robust and cost-effective method that could be applied to primary tissues for mapping global protein occupancies.
  • Item
    Integrated miR-mRNA Network Underlying Hepatic Fat Accumulation in Humans
    Srivastava, Rajneesh; Wang, Xiaoliang; Lin, Jingmei; Wei, Rongrong; Chaturvedi, Praneet; Chalasani, Naga P.; Janga, Sarath Chandra; Liu, Wanqing
    Background: An integrate miRs and mRNAs analysis in the development of Non-Alcoholic Fatty Liver Disease (NAFLD) and Non-Alcoholic Steatohepatitis (NASH) is lacking. We aimed to identify miRs as well as the miR-mRNA regulatory network involved in hepatic fat accumulation and human NAFLD. Materials and Methods: Hepatic fat content (HFC) was measured, and liver histology was characterized for 73 liver tissue samples. MicroRNAs and mRNAs significantly associated with HFC were identified based on genome-wide mRNA and miR expression profiling data. These miRs and mRNAs were further used to build miR-mRNA association networks in NAFLD and normal samples based on the potential miR-mRNA targeting, as well as to conduct a pathway enrichment analysis. Results: We identified 62 miRs significantly correlated with HFC (p<0.05), with miR-518b and miR-19b demonstrated to be the most significant positive and negative correlation with HFC, respectively (p<0.008 for both). Many miRs that were previously associated with NAFLD/NASH were also observed. Integrated network analysis indicated that a few miRs-30b*, 616, 17*, 129-5p, 204, and 20a controlled >80% of HFC-associated mRNAs in this network, and the regulation network was significantly rewired from normal to NAFLD. Pathway analyses revealed that inflammation pathways mediated by chemokine and cytokine signaling, Wnt signaling, lntegrin signaling and Natural killer cell mediated cytotoxicity were enriched (p<0.05) in hepatic fat accumulation.
  • Item
    Direct RNA-sequencing of human cell lines for transcriptome-wide mapping and annotation of 3' tails at single molecule resolution
    Govindaraman, Aniruddhan; Quoseena, Mir; Kadumuri, Raja Shekar Vanna; Srivastava, Mansi; Srivastava, Rajneesh; Janga, Sarath Chandra
    The 3' endonucleolytic cleavage of pre-messenger RNA (pre-mRNA) and successive polyadenylation is a fundamental cellular process in eukaryotes. The 3' terminal regions are known to be polyadenylated by canonical poly(A) polymerases during RNA processing of messenger RNA (mRNA) molecules, however, they are also known to harbor additional UnMapped Regions (UMR) composed of uridylation and guanylation[1]. Although short read sequencing technologies are extensively used to study 3' terminal regions, major limitations of these approaches include their inability to detect homopolymeric sequences and sequence full length isoforms [1-2]. Nanopore sequencing enables the long read sequencing and identification of full length transcripts at a single molecule resolution, however currently there are no tools for systematically analyzing 3' terminal UMRs from direct RNA-sequencing datasets. Here, we present RAPTOR (https://github.com/aniram118/RAPTOR), a command line tool for 3' terminal UMR analysis of nanopore direct RNA sequencing data. RAPTOR provides a comprehensive report of UMR sequence information, cognate transcript annotations, nucleotide base composition, conserved hexamer signals and a range of analyses plots at a single molecule resolution. For benchmarking, we sequenced mRNA samples obtained from HepG2 (Liver Hepatocellular Carcinoma) & K562 (Bone Marrow Chronic myelogenous leukemia) cell lines resulting in 243,802 & 598,428 reads respectively. RAPTOR identified high quality UMRs, exhibited median lengths of 201 and 173 nt in HepG2 and K562 transcriptomes respectively. Nucleotide composition analysis of the identified 3' UMRs showed an enrichment for A and U nucleotides in both HepG2 [A: 29%, U: 28%, G:20%, C:23%] and K562 [A : 30%, U: 29%, G:1 9%, C:22%] cells. Several high confidence UMRs were verified by qPCR and sanger sequencing confirming sequence length and identity, respectively. In addition, denovo motif analysis of UMR regions enabled the discovery of several noncanonical motifs beyond Poly A/U patterns. These UMR motifs were identified to be significantly - (p-value <0.01) associated with the established binding motifs of several known RNA Binding Proteins including SART3, HuR (ELAVL1), TIA1 , IGF2BP2/3, PABPCs, PCBPs, SRSFs, HNRNPs and RBM /6, suggesting an unappreciated role of these RBPs in binding to 3' tails of mRNAs.
  • Item
    Structure and Constraints Imposed on the Network of miRNA Mediated Regulation of RNA-Binding Proteins in the Human Genome
    Srivastava, Raineesh; Siddappa, Manjunath; Janga, Sarath Chandra
    MicroRNAs (miRs) and RNA-binding proteins (RBPs) mediate post transcriptional regulation with uncharacterized communication among themselves on a global scale, thus amplifying a new level of complexity of gene expression and regulation. In this study, we aimed to investigate the miR control over RBPs with respect to non-RBPs, at transcript level and its impact at protein level. We predicted miR targeted transcripts on a genome-wide scale using TargetScan and miRanda algorithms and calculated the proportion of target transcripts (separately for RBPs and non-RBPs) controlled by each miRs. Such genome wide miR-mRNA networks were analyzed for their impact on (a) targeted transcripts' expression [Human body map RNA sequence data, quantified by SAILFISH] pattern across 1.6 tissue type (b} RBPs' transcript half life [HEK293] and (c) targets' protein abundance (Human Protein Atlas) pattern across 9, tissue types, using eq1ual binning approach with respect to degree of miR reg1ulation. We observed that the proportion of RBP transcripts controlled by miRs was significantly different than that for Non-RBP' transcripts (p-value < 2.2e-16). The number of RBP transcripts controlled by miRs exhibit scaling distribution with more than 50% of uniquie RBP transcripts targeted by 0.3% of the miRNAs. miRs extensively regulating RBP transcripts included miR-4739, miR-4728-Sp, miR-608, miR-149-3p while 52% of RBP transcripts were targeted by 28% weakly regulating miRs. miRs exhibit a consistent controlling pattern over RBPs (further supported by half lives) and non-RBPs at transcript level in all tissue types, further justifying their involvement in degradation/destabilization. However miRs have no significant influence over RBPs' protein level when compared to Non-RBPs. miRs were found altering the RBPs' transcript level, while the protein levels remained unaltered across tissues possibly suggesting an uncharacterized buffering mechanism which maintains high protein levels for RBPs. Our study therefore puts forward a means by which high translation rate for RBP transcripts can ensure just-in-time protein production of RBPs across tissue types.
  • Item
    Genomic and mechanistic insights of convergent transcription in bacterial genomes
    Chetal, Kashish; Janga, Sarath Chandra
    Convergent gene pairs with overlapping head-to-head configuration are widely spread across both eukaryotic and prokaryotic genomes. They are believed to contribute to the regulation of genes at both transcriptional and post-transcriptional levels, although the factors contributing to their abundance across genomes and mechanistic basis for their prevalence are poorly understood. In this study, we explore the role of various factors contributing to convergent overlapping transcription in bacterial genomes. Our analysis shows that the proportion of convergent overlapping gene pairs (COGPs) in a genome is affected by endospore formation, bacterial habitat and the temperature range. In particular, we show that bacterial genomes thriving in specialized habitats such as thermophiles exhibit a high proportion of COGPs. Our results also show that the density distribution of COGPs across the genomes is high for shorter overlaps with increased conservation of distances for decreasing overlaps. Our study also reveals that COGPs frequently contain stop codon overlaps with the middle base exhibiting mismatches between complementary strands. Functional analysis using COGs (Cluster of Orthologous groups) annotations suggested that cell motility, cell metabolism, storage, and cell signaling are enriched among COGPs suggesting their role in processes beyond regulation. Our analysis provides genomic insights into this unappreciated regulatory phenomenon, allowing a refined understanding of their contribution to bacterial phenotypes.
  • Item
    Alternative Splicing Profile Comparison of Differentiating I-helper Cells to Dissect the Splicing Signatures of Th1, Th2, 1h17 and Treg Cells
    Lakshmipati, Deepak Kumar; Quoseena, Mir; Ulrich, Benjamin; Kaplan, Mark; Janga, Sarath Chandra
    This study focuses on the contribution of Alternative Splicing (AS) events in the differentiation and post-differentiation functions of T-helper cells, specifically in Thl, Th2, Th9, Th17 and Treg cells. T cell RNA-seq data from 72hr and 2week post differentiation time points was analyzed using (r-MATS) for alternative splicing events. We observed majority of the significant events are Skipped Exon (SE) events originating from a total of 1,556 genes and lntron Retention (RI) events were the second most abundant event occurring in 1,254 genes at 72 hours post differentiation. These numbers were significantly lower at 2 weeks post differentiation. PCR and qPCR validations confirmed scores of novel splicing event predictions. Results showed several skipped exon (SE) events in KTNl, IL4RA IL27, Hnrmpd, CREM and Arid4b showing different mRNA isoforms across multiple naïve vs differentiated T cell combinations. Overall, RI event associated genes were more prevalent (3,239 genes) than those exhibiting SE (2810 genes). SE events were associated with 10.8% (Th17), 11.2% (Treg), 12.1% (Th2) and 13.9% (Thl) of the genes, a similar trend was observed with RI events with a prevalence of 12.2% (Th17), 12.5% (Treg), 14.2% (Thl) and 14.4% (Th2) of the genes. Gene ontology results showed most of the genes showing SE and RI events are involved in processes like 'mRNA Processing', 'RNA Processing' and 'RNA Binding' and ontology results for retained introns also showed p53 suppression proteins, regulated exocytosis of neurotransmitters and hormones. It was also observed that Introns consistently favored retention at the 3' end of the gene than the 5', with 430 genes showing intron retention events at the 3' end and 21 genes exhibiting them at the 5' end, for the 72 hour time point. Enriched functional ontologies were consistently seen across all cell types to be exclusive for the genes showing RI in the 5' end vs the 3' end.