- Browse by Author
Browsing by Author "Chen, Jake Yue"
Now showing 1 - 10 of 26
Results Per Page
Sort Options
Item Computational Analysis of Drought Stress-Associated miRNAs and miRNA Co-Regulation Network in Physcomitrella patens.(Elsevier, 2011-04) Wan, Ping; Wu, Jun; Zhou, Yuan; Xiao, Junshu; Feng, Jie; Zhao, Weizhong; Xiang, Shen; Jiang, Guanglong; Chen, Jake Yue; Department of Biohealth Informatics, IU School of Informatics and ComputingmiRNAs are non-coding small RNAs that involve diverse biological processes. Until now, little is known about their roles in plant drought resistance. Physcomitrella patens is highly tolerant to drought; however, it is not clear about the basic biology of the traits that contribute P. patens this important character. In this work, we discovered 16 drought stress-associated miRNA (DsAmR) families in P. patens through computational analysis. Due to the possible discrepancy of expression periods and tissue distributions between potential DsAmRs and their targeting genes, and the existence of false positive results in computational identification, the prediction results should be examined with further experimental validation. We also constructed an miRNA co-regulation network, and identified two network hubs, miR902a-5p and miR414, which may play important roles in regulating drought-resistance traits. We distributed our results through an online database named ppt-miRBase, which can be accessed at http://bioinfor.cnu.edu.cn/ppt_miRBase/index.php. Our methods in finding DsAmR and miRNA co-regulation network showed a new direction for identifying miRNA functions.Item Computational Biomarker Discovery: From Systems Biology to Predictive and Personalized Medicine Applications(Office of the Vice Chancellor for Research, 2010-04-09) Chen, Jake Yue; Wu, Xiaogang; Zhang, Fan; Pandey, Ragini; Huang, Hui; Huan, TianxiaoWith the advent of Genome-based Medicine, there is an escalating need for discovering how the modifications of biological molecules, either individually or as an ensemble, can be uniquely associated with human physiological states. This knowledge could lead to breakthroughs in the development of clinical tests known as "biomarker tests" to assess disease risks, early onset, prognosis, and treatment outcome predictions. Therefore, development of molecular biomarkers is a key agenda in the next 5-10 years to take full advantage of the human genome to improve human well-beings. However, the complexity of human biological systems and imperfect instrumentations of high-throughput biological instruments/results have created significant hurdles in biomarker development. Only recently did computational methods become an important player of the research topic, which has seen conventional molecular biomarkers development both extremely long and cost-ineffective. At Indiana Center for Systems Biology and Personalized Medicine, we are developing several computational systems biology strategies to address these challenges. We will show examples of how we approach the problem using a variety of computational techniques, including data mining, algorithm development to take into account of biological contexts, biological knowledge integration, and information visualization. Finally, we outline how research in this direction to derive more robust molecular biomarkers may lead to predictive and personalized medicine. Indiana Center for Systems Biology and Personalized Medicine (CSBPM) was founded in 2007 as an IUPUI signature center by Dr. Jake Chen and his colleagues in the Indiana University School of Informatics, School of Medicine, and School of Science. CSBPM is the only research center in the State of Indiana with the primary goal of pursuing predictive and personalized medicine. CSBPM currently consists of eleven faculty members from the School of Medicine, School of Science, School of Engineering, School of Informatics, and Indiana University Simon Cancer Center. The primary mission of the center is to foster the development and use of systems biology and computational modeling techniques to address challenges in future genome-based medicine. The ultimate goal of the center is to shorten the discovery-to-practice gap between integrative ―Omics‖ biology studies—including genomics, transcriptomics, proteomics, and metabolomics—and predictive and personalized medicine applications.Item Discovery of pathway biomarkers from coupled proteomics and systems biology methods(BMC, 2010-11-02) Zhang, Fan; Chen, Jake Yue; BioHealth Informatics, School of Informatics and ComputingBackground: Breast cancer is worldwide the second most common type of cancer after lung cancer. Plasma proteome profiling may have a higher chance to identify protein changes between plasma samples such as normal and breast cancer tissues. Breast cancer cell lines have long been used by researches as model system for identifying protein biomarkers. A comparison of the set of proteins which change in plasma with previously published findings from proteomic analysis of human breast cancer cell lines may identify with a higher confidence a subset of candidate protein biomarker. Results: In this study, we analyzed a liquid chromatography (LC) coupled tandem mass spectrometry (MS/MS) proteomics dataset from plasma samples of 40 healthy women and 40 women diagnosed with breast cancer. Using a two-sample t-statistics and permutation procedure, we identified 254 statistically significant, differentially expressed proteins, among which 208 are over-expressed and 46 are under-expressed in breast cancer plasma. We validated this result against previously published proteomic results of human breast cancer cell lines and signaling pathways to derive 25 candidate protein biomarkers in a panel. Using the pathway analysis, we observed that the 25 “activated” plasma proteins were present in several cancer pathways, including ‘Complement and coagulation cascades’, ‘Regulation of actin cytoskeleton’, and ‘Focal adhesion’, and match well with previously reported studies. Additional gene ontology analysis of the 25 proteins also showed that cellular metabolic process and response to external stimulus (especially proteolysis and acute inflammatory response) were enriched functional annotations of the proteins identified in the breast cancer plasma samples. By cross-validation using two additional proteomics studies, we obtained 86% and 83% similarities in pathway-protein matrix between the first study and the two testing studies, which is much better than the similarity we measured with proteins. Conclusions: We presented a ‘systems biology’ method to identify, characterize, analyze and validate panel biomarkers in breast cancer proteomics data, which includes 1) t statistics and permutation process, 2) network, pathway and function annotation analysis, and 3) cross-validation of multiple studies. Our results showed that the systems biology approach is essential to the understanding molecular mechanisms of panel protein biomarkers.Item DMAP: a connectivity map database to enable identification of novel drug repositioning candidates(BioMed Central, 2015-09-25) Huang, Hui; Nguyen, Thanh; Ibrahim, Sara; Shantharam, Sandeep; Yue, Zongliang; Chen, Jake Yue; Department of Computer & Information Science, School of ScienceBACKGROUND: Drug repositioning is a cost-efficient and time-saving process to drug development compared to traditional techniques. A systematic method to drug repositioning is to identify candidate drug's gene expression profiles on target disease models and determine how similar these profiles are to approved drugs. Databases such as the CMAP have been developed recently to help with systematic drug repositioning. METHODS: To overcome the limitation of connectivity maps on data coverage, we constructed a comprehensive in silico drug-protein connectivity map called DMAP, which contains directed drug-to-protein effects and effect scores. The drug-to-protein effect scores are compiled from all database entries between the drug and protein have been previously observed and provide a confidence measure on the quality of such drug-to-protein effects. RESULTS: In DMAP, we have compiled the direct effects between 24,121 PubChem Compound ID (CID), which were mapped from 289,571 chemical entities recognized from public literature, and 5,196 reviewed Uniprot proteins. DMAP compiles a total of 438,004 chemical-to-protein effect relationships. Compared to CMAP, DMAP shows an increase of 221 folds in the number of chemicals and 1.92 fold in the number of ATC codes. Furthermore, by overlapping DMAP chemicals with the approved drugs with known indications from the TTD database and literature, we obtained 982 drugs and 622 diseases; meanwhile, we only obtained 394 drugs with known indication from CMAP. To validate the feasibility of applying new DMAP for systematic drug repositioning, we compared the performance of DMAP and the well-known CMAP database on two popular computational techniques: drug-drug-similarity-based method with leave-one-out validation and Kolmogorov-Smirnov scoring based method. In drug-drug-similarity-based method, the drug repositioning prediction using DMAP achieved an Area-Under-Curve (AUC) score of 0.82, compared with that using CMAP, AUC = 0.64. For Kolmogorov-Smirnov scoring based method, with DMAP, we were able to retrieve several drug indications which could not be retrieved using CMAP. DMAP data can be queried using the existing C2MAP server or downloaded freely at: http://bio.informatics.iupui.edu/cmaps CONCLUSIONS: Reliable measurements of how drug affect disease-related proteins are critical to ongoing drug development in the genome medicine era. We demonstrated that DMAP can help drug development professionals assess drug-to-protein relationship data and improve chances of success for systematic drug repositioning efforts.Item Graft-Versus-Host Disease-Free Antitumoral Signature After Allogeneic Donor Lymphocyte Injection Identified by Proteomics and Systems Biology(American Society of Clinical Oncology, 2019) Liu, Xiaowen; Yue, Zongliang; Cao, Yimou; Taylor, Lauren; Zhang, Qing; Choi, Sung W.; Hanash, Samir; Ito, Sawa; Chen, Jake Yue; Wu, Huanmei; Paczesny, Sophie; Pediatrics, School of MedicinePURPOSE: As a tumor immunotherapy, allogeneic hematopoietic cell transplantation with subsequent donor lymphocyte injection (DLI) aims to induce the graft-versus-tumor (GVT) effect but often also leads to acute graft-versus-host disease (GVHD). Plasma tests that can predict the likelihood of GVT without GVHD are still needed. PATIENTS AND METHODS: We first used an intact-protein analysis system to profile the plasma proteome post-DLI of patients who experienced GVT and acute GVHD for comparison with the proteome of patients who experienced GVT without GVHD in a training set. Our novel six-step systems biology analysis involved removing common proteins and GVHD-specific proteins, creating a protein-protein interaction network, calculating relevance and penalty scores, and visualizing candidate biomarkers in gene networks. We then performed a second proteomics experiment in a validation set of patients who experienced GVT without acute GVHD after DLI for comparison with the proteome of patients before DLI. We next combined the two experiments to define a biologically relevant signature of GVT without GVHD. An independent experiment with single-cell profiling in tumor antigen-activated T cells from a patient with post-hematopoietic cell transplantation relapse was performed. RESULTS: The approach provided a list of 46 proteins in the training set, and 30 proteins in the validation set were associated with GVT without GVHD. The combination of the two experiments defined a unique 61-protein signature of GVT without GVHD. Finally, the single-cell profiling in activated T cells found 43 of the 61 genes. Novel markers, such as RPL23, ILF2, CD58, and CRTAM, were identified and could be extended to other antitumoral responses. CONCLUSION: Our multiomic analysis provides, to our knowledge, the first human plasma signature for GVT without GVHD. Risk stratification on the basis of this signature would allow for customized treatment plans.Item HAPPI-2: a Comprehensive and High-quality Map of Human Annotated and Predicted Protein Interactions(BioMed Central, 2017-02-17) Chen, Jake Yue; Pandey, Ragini; Nguyen, Thanh M.; Department of Biohealth Informatics, School of Informatics and ComputingBACKGROUND: Human protein-protein interaction (PPI) data is essential to network and systems biology studies. PPI data can help biochemists hypothesize how proteins form complexes by binding to each other, how extracellular signals propagate through post-translational modification of de-activated signaling molecules, and how chemical reactions are coupled by enzymes involved in a complex biological process. Our capability to develop good public database resources for human PPI data has a direct impact on the quality of future research on genome biology and medicine. RESULTS: The database of Human Annotated and Predicted Protein Interactions (HAPPI) version 2.0 is a major update to the original HAPPI 1.0 database. It contains 2,922,202 unique protein-protein interactions (PPI) linked by 23,060 human proteins, making it the most comprehensive database covering human PPI data today. These PPIs contain both physical/direct interactions and high-quality functional/indirect interactions. Compared with the HAPPI 1.0 database release, HAPPI database version 2.0 (HAPPI-2) represents a 485% of human PPI data coverage increase and a 73% protein coverage increase. The revamped HAPPI web portal provides users with a friendly search, curation, and data retrieval interface, allowing them to retrieve human PPIs and available annotation information on the interaction type, interaction quality, interacting partner drug targeting data, and disease information. The updated HAPPI-2 can be freely accessed by Academic users at http://discovery.informatics.uab.edu/HAPPI . CONCLUSIONS: While the underlying data for HAPPI-2 are integrated from a diverse data sources, the new HAPPI-2 release represents a good balance between data coverage and data quality of human PPIs, making it ideally suited for network biology.Item HAPPI: A Bioinformatics Database Platform Enabling Network Biology Studies(2006-06-29T19:05:24Z) Mamidipalli, SudhaRani; Chen, Jake YueThe publication of the draft human genome consisting of 30,000 genes is merely the beginning of genome biology. A new way to understand the complexity and richness of molecular and cellular function of proteins in biological processes is through understanding of biological networks. These networks include protein-protein interaction networks, gene regulatory networks, and metabolic networks. In this thesis, we focus on human protein-protein interaction networks using informatics techniques. First, we performed a thorough literature survey to document different experimental methods to detect and collect protein interactions, current public databases that store these interactions, computational software to predict, validate and interpret protein networks. Then, we developed the Human Annotated Protein-Protein Interaction (HAPPI) database to manage a wealth of integrated information related to protein functions, protein-protein functional links, and protein-protein interactions. Approximately 12900 proteins from Swissprot, 57900 proteins from Trembl, 52186 protein-domains from Swisspfam, 4084 gene-pathways from KEGG, 2403190 interactions from STRING and 51207 interactions from OPHID public databases were integrated into a single relational database platform using Oracle 10g on an IU Supercomputing grid. We further assigned a confidence score to each protein interaction pair to help assess the quality and reliability of protein-protein interaction. We hosted the database on the Discovery Informatics and Computing web site, which is now publicly accessible. HAPPI database differs from other protein interaction databases in these following aspects: 1) It focuses on human protein interactions and contains approximately 860000 high-confidence protein interaction records—one of the most complete and reliable sources of human protein interaction today; 2) It includes thorough protein domain, gene and pathway information of interacting proteins, therefore providing a whole view of protein functional information; 3) It contains a consistent ranking score that can be used to gauge the confidence of protein interactions. To show the benefits of HAPPI database, we performed a case study using Insulin Signaling pathway in collaboration with a biology team on campus. We began by taking two sets of proteins that were previously well studied as separate processes, set A and set B. We queried these proteins against the HAPPI database, and derived high-confidence protein interaction data sets annotated with known KEGG pathways. We then organized these protein interactions on a network diagram. The end result shows many novel hub proteins that connect set A or B proteins. Some hub proteins are even novel members outside of any annotated pathway, making them interesting targets to validate for subsequent biological studies.Item An integrated proteomics analysis of bone tissues in response to mechanical stimulation(2010-07) Li, Jillian; Zhang, Fan; Chen, Jake YueBone cells can sense physical forces and convert mechanical stimulation conditions into biochemical signals that lead to expression of mechanically sensitive genes and proteins. However, it is still poorly understood how genes and proteins in bone cells are orchestrated to respond to mechanical stimulations. In this research, we applied integrated proteomics, statistical, and network biology techniques to study proteome-level changes to bone tissue cells in response to two different conditions, normal loading and fatigue loading. We harvested ulna midshafts and isolated proteins from the control, loaded, and fatigue loaded Rats. Using a label-free liquid chromatography tandem mass spectrometry (LC-MS/MS) experimental proteomics technique, we derived a comprehensive list of 1,058 proteins that are differentially expressed among normal loading, fatigue loading, and controls. By carefully developing protein selection filters and statistical models, we were able to identify 42 proteins representing 21 Rat genes that were significantly associated with bone cells' response to quantitative changes between normal loading and fatigue loading conditions. We further applied network biology techniques by building a fatigue loading activated protein-protein interaction subnetwork involving 9 of the human-homolog counterpart of the 21 rat genes in a large connected network component. Our study shows that the combination of decreased anti-apoptotic factor, Raf1, and increased pro-apoptotic factor, PDCD8, results in significant increase in the number of apoptotic osteocytes following fatigue loading. We believe controlling osteoblast differentiation/proliferation and osteocyte apoptosis could be promising directions for developing future therapeutic solutions for related bone diseases.Item INTEGRATIVE SYSTEM BIOLOGY STUDIES ON HIGH THROUGHPUT GENOMICS AND PROTEOMICS DATASET(2012-03-20) Sonachalam, Madhankumar; Chen, Jake Yue; Shen, Li; Zhou, YaoqiThe post genomic era has propelled us to the view that the biological systems are complex network of interacting genes, proteins and small molecules that give rise to biological form and function. The past decade has seen the advent of number of new technologies designed to study the biological systems on a genome wide scale. These new technologies offers an insight in to the activity of thousands of genes and proteins in cell thereby changed the conventional reductionist view of the systems. However the deluge of data surpasses the analytical and critical abilities of the researches and thereby demands the development of new computational methods. The challenge no longer lies in the acquisition of expression profiles, but rather in the interpretation for the results to gain insights into biological mechanisms. In three different case studies, we applied various system biology techniques on publicly available and in-house genomics and proteomics data set to identify sub-network signatures. In First study, we integrated prior knowledge from gene signatures, GSEA and gene/protein network modeling to identify pathways involved in colorectal cancer, while in second, we identified plasma based network signatures for Alzheimer's disease by combining various feature selection and classification approach. In final study, we did an integrated miRNA-mRNA analysis to identify the role of Myeloid Derived Stem Cells (MDSCs) in T-Cell suppression.Item A method for identifying discriminative isoform-specific peptides for clinical proteomics application(BioMed Central, 2016-08-22) Zhang, Fan; Chen, Jake Yue; Department of Biohealth Informatics, IU School of Informatics and ComputingBACKGROUND: Clinical proteomics application aims at solving a specific clinical problem within the context of a clinical study. It has been growing rapidly in the field of biomarker discovery, especially in the area of cancer diagnostics. Until recently, protein isoform has not been viewed as a new class of early diagnostic biomarkers for clinical proteomics. A protein isoform is one of different forms of the same protein. Different forms of a protein may be produced from single-nucleotide polymorphisms (SNPs), alternative splicing, or post-translational modifications (PTMs). Previous studies have shown that protein isoforms play critical roles in tumorigenesis, disease diagnosis, and prognosis. Identifying and characterizing protein isoforms are essential to the study of molecular mechanisms and early detection of complex diseases such as breast cancer. However, there are limitations with traditional methods such as EST sequencing, Microarray profiling (exon array, Exon-exon junction array), mRNA next-generation sequencing used for protein isoform determination: 1) not in the protein level, 2) no connectivity about connection of nonadjacent exons, 3) no SNPs and PTMs, and 4) low reproducibility. Moreover, there exist the computational challenges of clinical proteomics studies: 1) low sensitivity of instruments, 2) high data noise, and 3) high variability and low repeatability, although recent advances in clinical proteomics technology, LC-MS/MS proteomics, have been used to identify candidate molecular biomarkers in diverse range of samples, including cells, tissues, serum/plasma, and other types of body fluids. RESULTS: Therefore, in the paper, we presented a peptidomics method for identifying cancer-related and isoform-specific peptide for clinical proteomics application from LC-MS/MS. First, we built a Peptidomic Database of Human Protein Isoforms, then created a peptidomics approach to perform large-scale screen of breast cancer-associated alternative splicing isoform markers in clinical proteomics, and lastly performed four kinds of validations: biological validation (explainable index), exon array, statistical validation of independent samples, and extensive pathway analysis. CONCLUSIONS: Our results showed that alternative splicing isoform makers can act as independent markers of breast cancer and that the method for identifying cancer-specific protein isoform biomarkers from clinical proteomics application is an effective one for increasing the number of identified alternative splicing isoform markers in clinical proteomics.
- «
- 1 (current)
- 2
- 3
- »