- Sarath Janga
Sarath Janga
Permanent URI for this collection
The emergence of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) in Wuhan (Hubei, China) in December 2019 has been declared a pandemic by the World Health Organization (WHO) due to its easy human to human transmission, making it a global public health concern. Coronaviruses are enveloped single-stranded ribonucleic acid (RNA) viruses with characteristic “crown” like appearance under two-dimensional transmission electron microscopy. Infections caused by these viruses result in severe pneumonia, fever and breathing difficulty. Currently there is a lack of effective vaccines and antiviral medication that has led to a global outbreak of SARS-CoV-2. Due to rapidly evolving nature of coronaviruses, their identification has become increasingly challenging. Therefore, it is important to develop diagnostic methods that can detect the virus rapidly, to prevent its transmission. Currently, most clinical diagnostic tests for viruses depend on detecting a viral antigen or rely on PCR amplification of viral nucleic acid derived from biological samples. These two approaches offer trade-offs in benefits: antigen tests (including current Point-Of-Care Tests [POCT]) are typically rapid but have low sensitivity, while PCR is more time-consuming but also more sensitive. Irrespective of the test used, most clinical diagnostic facilities report a non-quantitative (binary) diagnostic result, and the data generated have limited capacity to inform insights into epidemiological linkage, vaccine efficacy, or antiviral susceptibility. Hence, there is an urgent need to generate new diagnostic tests that combine POCT, speed, sensitivity, detection of coinfection by other viral strains, and generation of quantitative or semi-quantitative data that can be used to identify drug resistance. Such data may also be used to reconstruct phylogeny to inform surveillance, public health strategy, and vaccine design.
Dr. Sarath Janga's lab has been working to employ “third-generation” portable, real-time bench top sequencers which use nanopores, to develop novel experimental protocols and computational algorithms to not only detect the presence of pathogens but also map their variability across clinical samples, to facilitate public health surveillance. More recently, his lab has been combining an efficient, novel and high-throughput viral RNA isolation methods accompanied with nanopore sequencing to develop automated computational software for real time detection of COVID19.
Dr. Janga's research to detect COVID19 virus strains for developing a rapid, real-time and scalable test that can be used in the clinics to help the healthcare workers, who are at the front lines of care and are getting exposed to infections, is another example of how IUPUI's faculty are TRANSLATING their RESEARCH INTO PRACTICE.
Browse
Recent Submissions
Item Sequoia: an interactive visual analytics platform for interpretation and feature extraction from nanopore sequencing datasets(BMC, 2021-07-07) Koonchanok, Ratanond; Daulatabad, Swapna Vidhur; Mir, Quoseena; Reda, Khairi; Janga, Sarath Chandra; Human-Centered Computing, School of Informatics and ComputingBackground: Direct-sequencing technologies, such as Oxford Nanopore's, are delivering long RNA reads with great efficacy and convenience. These technologies afford an ability to detect post-transcriptional modifications at a single-molecule resolution, promising new insights into the functional roles of RNA. However, realizing this potential requires new tools to analyze and explore this type of data. Result: Here, we present Sequoia, a visual analytics tool that allows users to interactively explore nanopore sequences. Sequoia combines a Python-based backend with a multi-view visualization interface, enabling users to import raw nanopore sequencing data in a Fast5 format, cluster sequences based on electric-current similarities, and drill-down onto signals to identify properties of interest. We demonstrate the application of Sequoia by generating and analyzing ~ 500k reads from direct RNA sequencing data of human HeLa cell line. We focus on comparing signal features from m6A and m5C RNA modifications as the first step towards building automated classifiers. We show how, through iterative visual exploration and tuning of dimensionality reduction parameters, we can separate modified RNA sequences from their unmodified counterparts. We also document new, qualitative signal signatures that characterize these modifications from otherwise normal RNA bases, which we were able to discover from the visualization. Conclusions: Sequoia's interactive features complement existing computational approaches in nanopore-based RNA workflows. The insights gleaned through visual analysis should help users in developing rationales, hypotheses, and insights into the dynamic nature of RNA. Sequoia is available at https://github.com/dnonatar/Sequoia .Item SliceIt: A genome-wide resource and visualization tool to design CRISPR/Cas9 screens for editing protein-RNA interaction sites in the human genome(Elsevier, 2020-06) Vemuri, Sasank; Srivastava, Rajneesh; Mir, Quoseena; Hashemikhabir, Seyedsasan; Dong, X. Charlie; Janga, Sarath Chandra; BioHealth Informatics, School of Informatics and ComputingSeveral protein-RNA cross linking protocols have been established in recent years to delineate the molecular interaction of an RNA Binding Protein (RBP) and its target RNAs. However, functional dissection of the role of the RBP binding sites in modulating the post-transcriptional fate of the target RNA remains challenging. CRISPR/Cas9 genome editing system is being commonly employed to perturb both coding and noncoding regions in the genome. With the advancements in genome-scale CRISPR/Cas9 screens, it is now possible to not only perturb specific binding sites but also probe the global impact of protein-RNA interaction sites across cell types. Here, we present SliceIt (http://sliceit.soic.iupui.edu/), a database of in silico sgRNA (single guide RNA) library to facilitate conducting such high throughput screens. SliceIt comprises of ~4.8 million unique sgRNAs with an estimated range of 2-8 sgRNAs designed per RBP binding site, for eCLIP experiments of >100 RBPs in HepG2 and K562 cell lines from the ENCODE project. SliceIt provides a user friendly environment, developed using advanced search engine framework, Elasticsearch. It is available in both table and genome browser views facilitating the easy navigation of RBP binding sites, designed sgRNAs, exon expression levels across 53 human tissues along with prevalence of SNPs and GWAS hits on binding sites. Exon expression profiles enable examination of locus specific changes proximal to the binding sites. Users can also upload custom tracks of various file formats directly onto genome browser, to navigate additional genomic features in the genome and compare with other types of omics profiles. All the binding site-centric information is dynamically accessible via "search by gene", "search by coordinates" and "search by RBP" options and readily available to download. Validation of the sgRNA library in SliceIt was performed by selecting RBP binding sites in Lipt1 gene and designing sgRNAs. Effect of CRISPR/Cas9 perturbations on the selected binding sites in HepG2 cell line, was confirmed based on altered proximal exon expression levels using qPCR, further supporting the utility of the resource to design experiments for perturbing protein-RNA interaction networks. Thus, SliceIt provides a one-stop repertoire of guide RNA library to perturb RBP binding sites, along with several layers of functional information to design both low and high throughput CRISPR/Cas9 screens, for studying the phenotypes and diseases associated with RBP binding sites.Item Lantern: an integrative repository of functional annotations for lncRNAs in the human genome(BMC, 2021-05-26) Daulatabad, Swapna Vidhur; Srivastava, Rajneesh; Janga, Sarath Chandra; BioHealth Informatics, School of Informatics and ComputingBackground: With advancements in omics technologies, the range of biological processes where long non-coding RNAs (lncRNAs) are involved, is expanding extensively, thereby generating the need to develop lncRNA annotation resources. Although, there are a plethora of resources for annotating genes, despite the extensive corpus of lncRNA literature, the available resources with lncRNA ontology annotations are rare. Results: We present a lncRNA annotation extractor and repository (Lantern), developed using PubMed's abstract retrieval engine and NCBO's recommender annotation system. Lantern's annotations were benchmarked against lncRNAdb's manually curated free text. Benchmarking analysis suggested that Lantern has a recall of 0.62 against lncRNAdb for 182 lncRNAs and precision of 0.8. Additionally, we also annotated lncRNAs with multiple omics annotations, including predicted cis-regulatory TFs, interactions with RBPs, tissue-specific expression profiles, protein co-expression networks, coding potential, sub-cellular localization, and SNPs for ~ 11,000 lncRNAs in the human genome, providing a one-stop dynamic visualization platform. Conclusions: Lantern integrates a novel, accurate semi-automatic ontology annotation engine derived annotations combined with a variety of multi-omics annotations for lncRNAs, to provide a central web resource for dissecting the functional dynamics of long non-coding RNAs and to facilitate future hypothesis-driven experiments. The annotation pipeline and a web resource with current annotations for human lncRNAs are freely available on sysbio.lab.iupui.edu/lantern.Item Clinical Features Distinguishing Diabetic Retinopathy Severity Using Artificial Intelligence(2022-07-29) Happe, Michael; Gill, Hunter; Salem, Doaa Hassan; Janga, Sarath Chandra; Hajrasouliha, AmirBACKGROUND AND HYPOTHESIS: 1 in 29 American diabetics suffer from diabetic retinopathy (DR), the weakening of blood vessels in the retina. DR goes undetected in nearly 50% of diabetics, allowing DR to steal the vision of many Americans. We hypothesize that increasing the rate and ease of diagnosing DR by introducing artificial intelligence-based methods in primary medical clinics will increase the long-term preservation of ocular health in diabetic patients. PROJECT METHODS: This retrospective cohort study was conducted under approval from the Institutional Review Board of Indiana University School of Medicine. Images were deidentified and no consent was taken due to the nature of this retrospective study. We categorized 676 patient files based upon HbA1c, severity of non-proliferative diabetic retinopathy (NPDR), and proliferative diabetic retinopathy (PDR). Retinal images were annotated to identify common features of DR: microaneurysms, hemorrhages, cotton wool spots, exudates, and neovascularization. The VGG Image Annotator application used for annotations allowed us to save structure coordinates into a separate database for future training of the artificial intelligence system. RESULTS: 228 (33.7%) of patients were diagnosed with diabetes, and 143 (62.7%) of those were diagnosed with DR. Two-sample t tests found significant differences between the HbA1c values of all diabetics compared to diabetics without retinopathy (p<0.007) and between all severities of DR versus diabetics without retinopathy (p<0.002). 283 eyes were diagnosed with a form of DR in this study: 37 mild NPDR, 42 moderate NPDR, 56 severe NPDR, and 148 PDR eyes. POTENTIAL IMPACT: With the dataset of coordinates and HbA1c values from this experiment, we aim to train an artificial intelligence system to diagnose DR through retinal imaging. The goal of this system is to be conveniently used in primary medical clinics to increase the detection rate of DR to preserve the ocular health of millions of future Americans.Item Transcriptome-wide high-throughput mapping of protein–RNA occupancy profiles using POP-seq(Springer Nature, 2021-01-13) Srivastava, Mansi; Srivastava, Rajneesh; Janga, Sarath Chandra; BioHealth Informatics, School of Informatics and ComputingInteraction between proteins and RNA is critical for post-transcriptional regulatory processes. Existing high throughput methods based on crosslinking of the protein–RNA complexes and poly-A pull down are reported to contribute to biases and are not readily amenable for identifying interaction sites on non poly-A RNAs. We present Protein Occupancy Profile-Sequencing (POP-seq), a phase separation based method in three versions, one of which does not require crosslinking, thus providing unbiased protein occupancy profiles on whole cell transcriptome without the requirement of poly-A pulldown. Our study demonstrates that ~ 68% of the total POP-seq peaks exhibited an overlap with publicly available protein–RNA interaction profiles of 97 RNA binding proteins (RBPs) in K562 cells. We show that POP-seq variants consistently capture protein–RNA interaction sites across a broad range of genes including on transcripts encoding for transcription factors (TFs), RNA-Binding Proteins (RBPs) and long non-coding RNAs (lncRNAs). POP-seq identified peaks exhibited a significant enrichment (p value < 2.2e−16) for GWAS SNPs, phenotypic, clinically relevant germline as well as somatic variants reported in cancer genomes, suggesting the prevalence of uncharacterized genomic variation in protein occupied sites on RNA. We demonstrate that the abundance of POP-seq peaks increases with an increase in expression of lncRNAs, suggesting that highly expressed lncRNA are likely to act as sponges for RBPs, contributing to the rewiring of protein–RNA interaction network in cancer cells. Overall, our data supports POP-seq as a robust and cost-effective method that could be applied to primary tissues for mapping global protein occupancies.Item Integrated miR-mRNA Network Underlying Hepatic Fat Accumulation in HumansSrivastava, Rajneesh; Wang, Xiaoliang; Lin, Jingmei; Wei, Rongrong; Chaturvedi, Praneet; Chalasani, Naga P.; Janga, Sarath Chandra; Liu, WanqingBackground: An integrate miRs and mRNAs analysis in the development of Non-Alcoholic Fatty Liver Disease (NAFLD) and Non-Alcoholic Steatohepatitis (NASH) is lacking. We aimed to identify miRs as well as the miR-mRNA regulatory network involved in hepatic fat accumulation and human NAFLD. Materials and Methods: Hepatic fat content (HFC) was measured, and liver histology was characterized for 73 liver tissue samples. MicroRNAs and mRNAs significantly associated with HFC were identified based on genome-wide mRNA and miR expression profiling data. These miRs and mRNAs were further used to build miR-mRNA association networks in NAFLD and normal samples based on the potential miR-mRNA targeting, as well as to conduct a pathway enrichment analysis. Results: We identified 62 miRs significantly correlated with HFC (p<0.05), with miR-518b and miR-19b demonstrated to be the most significant positive and negative correlation with HFC, respectively (p<0.008 for both). Many miRs that were previously associated with NAFLD/NASH were also observed. Integrated network analysis indicated that a few miRs-30b*, 616, 17*, 129-5p, 204, and 20a controlled >80% of HFC-associated mRNAs in this network, and the regulation network was significantly rewired from normal to NAFLD. Pathway analyses revealed that inflammation pathways mediated by chemokine and cytokine signaling, Wnt signaling, lntegrin signaling and Natural killer cell mediated cytotoxicity were enriched (p<0.05) in hepatic fat accumulation.Item Kidney Specific Regulatory Network in Mouse Uncovers Functional, Evolutionary and Disease DynamicsHashemikhabir, Seyedsasan; Srivastava, Rajneesh; Janga, Sarath ChandraTranscription factors (TFs) operate in a combinatorial fashion to regulate the expression of a gene or a group of genes; however, their tissue-specific regulatory interactions are not fully characterized. In this study, we construct and investigate kindey-specific regulatory (KSR) network for mouse. We obtained upstream regions of genes in the mouse genome from ENSEMBL and extracted DNase 1 Hypersensitive sites (DHS) for 8-week mouse kidney from ENCODE project. Similarly, the position weight matrices (PWMs) for TF binding motifs (BMo) were extracted from JASPAR. Jolma, TRANSFAC and mapped in the mouse genome using FIMO. These BMo were integrated with obtained DHS signals (narrow peak) in 5 KBs upstream regions. The resulting TFs and their targeted genes were modeled as directed interaction network comprising of 619 TFs and their corresponding 13500 target genes. We trimmed the resulting network by only keeping the genes that function as TFs. Resulting TF-TF network (of 619 nodes) was analyzed to provide a holistic picture of TF-TF interactions in mouse kidney tissue while the global network was studied for conservation across 61 species and relevance in kidney associated diseases. We observed that genes related to diseases were significantly enriched in second and third layers in network hierarchy. Conservation analysis of Mouse KSR revealed >50% conservation in close relatives such as rat, human, dog, squirrel and less conserved in invertebrates and yeast, thus elucidating network complexity increases with increase in kidney functionality from lower to higher species. In addition, mouse KSR was examined in its closest relative, rat for segments of nephron - TAL (Thick ascending limb), PT (Proximal tubules), IMCD (Inner medullary collecting duct), which revealed a significant enrichment of TFs for their corresponding original group in mouse KSR. Further, this network was investigated in diverse model kidney diseases such as hypertension, diabetes and kidney renal clear cell carcinoma (KIRC). The compendium of the network reported in this study can form a roadmap for increasing our understanding of the variations in regulatory wiring in kidney diseases.Item Prediction and Evolutionary Analysis of RNA Binding Proteins Across Eukaryotic GenomesHassan, Huzaifa; Janga, Sarath ChandraRNA Binding Proteins (RBPs) are key players in several post transcriptional regulatory mechanisms and mediate the metabolism of RNA in the cell. High throughput technologies such as cross-linking followed by Mass Spectrometry (MS) have led to the identification of large number of RBPs and RNA binding domains (RBDs) encoded by them. Although experimental methods have increased the repertoire of RBPs in model systems, the complete repertoire of RBPs across eukaryotic species is far from complete. In this study, we developed a computational pipeline to predict RNA binding proteins using RNA binding domains and protein homology information. Our approach involved, using peptides which can bind to RNA from 529 RBPs and a dataset of 1344 experimentally known human RBPs as a reference set. Domain based predictions using HMMER were integrated with homology information to get an integrated genome wide prediction of RBPs across 69 species. Benchmarking of these predictions against mouse genes annotated as RBPs resulted in a precision of 60% and recall of 75%. An average of 1750 RBPs were identified across eukaryotes comprising of mammals, birds, amphibians, insects and worms. Although RBPs were found to be highly conserved across the phylogenetic spectrum, few lower order species such as lamprey, Caenorhabditis elegans and yeast exhibited fewer RBPs encoded in their genomes, suggestive of the divergence of RBP repertoire in distant relatives. In contrast to Transcription Factors (TFs) and kinases, genes encoding for RBPs exhibited an increase in their number (p-value: 0.0013) with increase in genome size. Although majority (56%) of the RNA binding regions could be mapped to the domains present in the Pfam database, a small fraction of the unmapped novel domains were detected in > 1 % of protein coding genes analyzed across genomes. A co-occurrence network of RBDs revealed prominent enrichment of Nup160, WD40 and RRM domains with other RBDs across eukaryotic genomes. Our proposed prediction pipeline and corresponding repertoire of RBPs would stand as a valuable resource for studying post transcriptional regulatory networks across eukaryotic species.Item Structure and Constraints Imposed on the Network of miRNA Mediated Regulation of RNA-Binding Proteins in the Human GenomeSrivastava, Raineesh; Siddappa, Manjunath; Janga, Sarath ChandraMicroRNAs (miRs) and RNA-binding proteins (RBPs) mediate post transcriptional regulation with uncharacterized communication among themselves on a global scale, thus amplifying a new level of complexity of gene expression and regulation. In this study, we aimed to investigate the miR control over RBPs with respect to non-RBPs, at transcript level and its impact at protein level. We predicted miR targeted transcripts on a genome-wide scale using TargetScan and miRanda algorithms and calculated the proportion of target transcripts (separately for RBPs and non-RBPs) controlled by each miRs. Such genome wide miR-mRNA networks were analyzed for their impact on (a) targeted transcripts' expression [Human body map RNA sequence data, quantified by SAILFISH] pattern across 1.6 tissue type (b} RBPs' transcript half life [HEK293] and (c) targets' protein abundance (Human Protein Atlas) pattern across 9, tissue types, using eq1ual binning approach with respect to degree of miR reg1ulation. We observed that the proportion of RBP transcripts controlled by miRs was significantly different than that for Non-RBP' transcripts (p-value < 2.2e-16). The number of RBP transcripts controlled by miRs exhibit scaling distribution with more than 50% of uniquie RBP transcripts targeted by 0.3% of the miRNAs. miRs extensively regulating RBP transcripts included miR-4739, miR-4728-Sp, miR-608, miR-149-3p while 52% of RBP transcripts were targeted by 28% weakly regulating miRs. miRs exhibit a consistent controlling pattern over RBPs (further supported by half lives) and non-RBPs at transcript level in all tissue types, further justifying their involvement in degradation/destabilization. However miRs have no significant influence over RBPs' protein level when compared to Non-RBPs. miRs were found altering the RBPs' transcript level, while the protein levels remained unaltered across tissues possibly suggesting an uncharacterized buffering mechanism which maintains high protein levels for RBPs. Our study therefore puts forward a means by which high translation rate for RBP transcripts can ensure just-in-time protein production of RBPs across tissue types.Item Abundance of Secondary Metabolites in Human MicrobiomeSarsani, Vishal; Kulkarni, Nikhil; Janga, Sarath ChandraHuman body harbors the most complicated microbial ecosystem. Bacteria that have co-evolved within a human context have barely been explored for secondary metabolites. These secondary metabolites are hypothesized to possess biological activities significant within the human host context. In our study, we studied conservation profiles of 203 secondary metabolite gene clusters across 16 human body sites and found that gastrointestinal tract and oral sites show the highest conservation for secondary metabolic gene clusters. We observed that majority of highly conserved metabolites belong to pathway type NRPS. Our phylogenetic analysis of highly conserved stool and oral samples revealed abundance of firmicutes, bacteroidetes and actinobacteria phylum.