- Browse by Subject
Browsing by Subject "Biomarker discovery"
Now showing 1 - 4 of 4
Results Per Page
Sort Options
Item Identification of novel alternative splicing biomarkers for breast cancer with LC/MS/MS and RNA-Seq(BMC, 2020-12-03) Zhang, Fan; Deng, Chris K.; Wang, Mu; Deng, Bin; Barber, Robert; Huang, Gang; Biochemistry and Molecular Biology, School of MedicineBackground: Alternative splicing isoforms have been reported as a new and robust class of diagnostic biomarkers. Over 95% of human genes are estimated to be alternatively spliced as a powerful means of producing functionally diverse proteins from a single gene. The emergence of next-generation sequencing technologies, especially RNA-seq, provides novel insights into large-scale detection and analysis of alternative splicing at the transcriptional level. Advances in Proteomic Technologies such as liquid chromatography coupled tandem mass spectrometry (LC-MS/MS), have shown tremendous power for the parallel characterization of large amount of proteins in biological samples. Although poor correspondence has been generally found from previous qualitative comparative analysis between proteomics and microarray data, significantly higher degrees of correlation have been observed at the level of exon. Combining protein and RNA data by searching LC-MS/MS data against a customized protein database from RNA-Seq may produce a subset of alternatively spliced protein isoform candidates that have higher confidence. Results: We developed a bioinformatics workflow to discover alternative splicing biomarkers from LC-MS/MS using RNA-Seq. First, we retrieved high confident, novel alternative splicing biomarkers from the breast cancer RNA-Seq database. Then, we translated these sequences into in silico Isoform Junction Peptides, and created a customized alternative splicing database for MS searching. Lastly, we ran the Open Mass spectrometry Search Algorithm against the customized alternative splicing database with breast cancer plasma proteome. Twenty six alternative splicing biomarker peptides with one single intron event and one exon skipping event were identified. Further interpretation of biological pathways with our Integrated Pathway Analysis Database showed that these 26 peptides are associated with Cancer, Signaling, Metabolism, Regulation, Immune System and Hemostasis pathways, which are consistent with the 256 alternative splicing biomarkers from the RNA-Seq. Conclusions: This paper presents a bioinformatics workflow for using RNA-seq data to discover novel alternative splicing biomarkers from the breast cancer proteome. As a complement to synthetic alternative splicing database technique for alternative splicing identification, this method combines the advantages of two platforms: mass spectrometry and next generation sequencing and can help identify potentially highly sample-specific alternative splicing isoform biomarkers at early-stage of cancer.Item Integrative Analysis for Identifying Multi-Layer Modules in Precision Medicine(2020-12) Yazdanparast, Aida; Wu, Huanmei; Li, Lang; Liu, Xiaowen; Liu, Yunlong; Zhang, ChiPrecision medicine aims to employ information from all modalities to develop a comprehensive view of disease progression and administer therapies tailored to the individual patient. A set of genomic features (gene CNVs, mutations, mRNA expressions, and protein abundances) is associated with each patient and it is hard to explain the phenotypic similarities such as gene essentiality or variability in drug response in a single genomic level. Thus, to extract biological principles it is critical to seek mutual information from multi-dimensional datasets. To address these concerns, we first conduct an integrated mRNA/protein analysis in both breast cancer cell lines and tumors, and most interestingly in the breast cancer subtypes. We identified cell lines that provide optimum heterogeneity models for studying the underlying biological processes of tumors. Our systematic observation across multi-omics data identifies distinct subgroups of cancer cells and patients. Based on this identified signal transduction between mRNA and RPPA, we developed a biclustering model to characterize key genetic alterations that are shared in both cancer cell lines and patients. We integrated two types of omics data including copy number variations, transcriptome, and proteome. Bi-EB adopts a data-driven statistics strategy by using Expected-Maximum (EM) algorithm to extract the foreground bicluster pattern from its background noise data in an iterative search. Using Bi-EB algorithm we selected translational gene sets that are characterized by highly correlated molecular profiles among RNA and proteins. To further investigate cell line and tissue in breast cancer we explore the relationship vii between genomic features and the phenotypic factors. Using in vitro/in vivo drug screening data, we adopt partial least square regression method and develop a multi-modular approach to predict anticancer therapy benefits for ER-negative breast cancer patients. The identified joint multi-dimensional modules here provide us new insights into the molecular mechanisms of drugs and cancer treatment.Item A method for identifying discriminative isoform-specific peptides for clinical proteomics application(BioMed Central, 2016-08-22) Zhang, Fan; Chen, Jake Yue; Department of Biohealth Informatics, IU School of Informatics and ComputingBACKGROUND: Clinical proteomics application aims at solving a specific clinical problem within the context of a clinical study. It has been growing rapidly in the field of biomarker discovery, especially in the area of cancer diagnostics. Until recently, protein isoform has not been viewed as a new class of early diagnostic biomarkers for clinical proteomics. A protein isoform is one of different forms of the same protein. Different forms of a protein may be produced from single-nucleotide polymorphisms (SNPs), alternative splicing, or post-translational modifications (PTMs). Previous studies have shown that protein isoforms play critical roles in tumorigenesis, disease diagnosis, and prognosis. Identifying and characterizing protein isoforms are essential to the study of molecular mechanisms and early detection of complex diseases such as breast cancer. However, there are limitations with traditional methods such as EST sequencing, Microarray profiling (exon array, Exon-exon junction array), mRNA next-generation sequencing used for protein isoform determination: 1) not in the protein level, 2) no connectivity about connection of nonadjacent exons, 3) no SNPs and PTMs, and 4) low reproducibility. Moreover, there exist the computational challenges of clinical proteomics studies: 1) low sensitivity of instruments, 2) high data noise, and 3) high variability and low repeatability, although recent advances in clinical proteomics technology, LC-MS/MS proteomics, have been used to identify candidate molecular biomarkers in diverse range of samples, including cells, tissues, serum/plasma, and other types of body fluids. RESULTS: Therefore, in the paper, we presented a peptidomics method for identifying cancer-related and isoform-specific peptide for clinical proteomics application from LC-MS/MS. First, we built a Peptidomic Database of Human Protein Isoforms, then created a peptidomics approach to perform large-scale screen of breast cancer-associated alternative splicing isoform markers in clinical proteomics, and lastly performed four kinds of validations: biological validation (explainable index), exon array, statistical validation of independent samples, and extensive pathway analysis. CONCLUSIONS: Our results showed that alternative splicing isoform makers can act as independent markers of breast cancer and that the method for identifying cancer-specific protein isoform biomarkers from clinical proteomics application is an effective one for increasing the number of identified alternative splicing isoform markers in clinical proteomics.Item Mining brain imaging and genetics data via structured sparse learning(2015-04-29) Yan, Jingwen; Wu, Huanmei; Shen, Li; Fang, Shiaofen; Liu, XiaowenAlzheimer's disease (AD) is a neurodegenerative disorder characterized by gradual loss of brain functions, usually preceded by memory impairments. It has been widely affecting aging Americans over 65 old and listed as 6th leading cause of death. More importantly, unlike other diseases, loss of brain function in AD progression usually leads to the significant decline in self-care abilities. And this will undoubtedly exert a lot of pressure on family members, friends, communities and the whole society due to the time-consuming daily care and high health care expenditures. In the past decade, while deaths attributed to the number one cause, heart disease, has decreased 16 percent, deaths attributed to AD has increased 68 percent. And all of these situations will continue to deteriorate as the population ages during the next several decades. To prevent such health care crisis, substantial efforts have been made to help cure, slow or stop the progression of the disease. The massive data generated through these efforts, like multimodal neuroimaging scans as well as next generation sequences, provides unprecedented opportunities for researchers to look into the deep side of the disease, with more confidence and precision. While plenty of efforts have been made to pull in those existing machine learning and statistical models, the correlated structure and high dimensionality of imaging and genetics data are generally ignored or avoided through targeted analysis. Therefore their performances on imaging genetics study are quite limited and still have plenty to be improved. The primary contribution of this work lies in the development of novel prior knowledge-guided regression and association models, and their applications in various neurobiological problems, such as identification of cognitive performance related imaging biomarkers and imaging genetics associations. In summary, this work has achieved the following research goals: (1) Explore the multimodal imaging biomarkers toward various cognitive functions using group-guided learning algorithms, (2) Development and application of novel network structure guided sparse regression model, (3) Development and application of novel network structure guided sparse multivariate association model, and (4) Promotion of the computation efficiency through parallelization strategies.