- Browse by Author
Browsing by Author "Cheng, Lijun"
Now showing 1 - 10 of 15
Results Per Page
Sort Options
Item Bi-EB: Empirical Bayesian Biclustering for Multi-Omics Data Integration Pattern Identification among Species(MDPI, 2022-10-30) Yazdanparast, Aida; Li, Lang; Zhang, Chi; Cheng, Lijun; BioHealth Informatics, School of Informatics and ComputingAlthough several biclustering algorithms have been studied, few are used for cross-pattern identification across species using multi-omics data mining. A fast empirical Bayesian biclustering (Bi-EB) algorithm is developed to detect the patterns shared from both integrated omics data and between species. The Bi-EB algorithm addresses the clinical critical translational question using the bioinformatics strategy, which addresses how modules of genotype variation associated with phenotype from cancer cell screening data can be identified and how these findings can be directly translated to a cancer patient subpopulation. Empirical Bayesian probabilistic interpretation and ratio strategy are proposed in Bi-EB for the first time to detect the pairwise regulation patterns among species and variations in multiple omics on a gene level, such as proteins and mRNA. An expectation-maximization (EM) optimal algorithm is used to extract the foreground co-current variations out of its background noise data by adjusting parameters with bicluster membership probability threshold Ac; and the bicluster average probability p. Three simulation experiments and two real biology mRNA and protein data analyses conducted on the well-known Cancer Genomics Atlas (TCGA) and The Cancer Cell Line Encyclopedia (CCLE) verify that the proposed Bi-EB algorithm can significantly improve the clustering recovery and relevance accuracy, outperforming the other seven biclustering methods-Cheng and Church (CC), xMOTIFs, BiMax, Plaid, Spectral, FABIA, and QUBIC-with a recovery score of 0.98 and a relevance score of 0.99. At the same time, the Bi-EB algorithm is used to determine shared the causality patterns of mRNA to the protein between patients and cancer cells in TCGA and CCLE breast cancer. The clinically well-known treatment target protein module estrogen receptor (ER), ER (p118), AR, BCL2, cyclin E1, and IGFBP2 are identified in accordance with their mRNA expression variations in the luminal-like subtype. Ten genes, including CCNB1, CDH1, KDR, RAB25, PRKCA, etc., found which can maintain the high accordance of mRNA-protein for both breast cancer patients and cell lines in basal-like subtypes for the first time. Bi-EB provides a useful biclustering analysis tool to discover the cross patterns hidden both in multiple data matrixes (omics) and species. The implementation of the Bi-EB method in the clinical setting will have a direct impact on administrating translational research based on the cancer cell screening guidance.Item A bioinformatics approach for precision medicine off-label drug drug selection among triple negative breast cancer patients(Oxford Academic, 2016-07) Cheng, Lijun; Schneider, Bryan P.; Li, Lang; Medical and Molecular Genetics, School of MedicineCancer has been extensively characterized on the basis of genomics. The integration of genetic information about cancers with data on how the cancers respond to target based therapy to help to optimum cancer treatment. OBJECTIVE: The increasing usage of sequencing technology in cancer research and clinical practice has enormously advanced our understanding of cancer mechanisms. The cancer precision medicine is becoming a reality. Although off-label drug usage is a common practice in treating cancer, it suffers from the lack of knowledge base for proper cancer drug selections. This eminent need has become even more apparent considering the upcoming genomics data. METHODS: In this paper, a personalized medicine knowledge base is constructed by integrating various cancer drugs, drug-target database, and knowledge sources for the proper cancer drugs and their target selections. Based on the knowledge base, a bioinformatics approach for cancer drugs selection in precision medicine is developed. It integrates personal molecular profile data, including copy number variation, mutation, and gene expression. RESULTS: By analyzing the 85 triple negative breast cancer (TNBC) patient data in the Cancer Genome Altar, we have shown that 71.7% of the TNBC patients have FDA approved drug targets, and 51.7% of the patients have more than one drug target. Sixty-five drug targets are identified as TNBC treatment targets and 85 candidate drugs are recommended. Many existing TNBC candidate targets, such as Poly (ADP-Ribose) Polymerase 1 (PARP1), Cell division protein kinase 6 (CDK6), epidermal growth factor receptor, etc., were identified. On the other hand, we found some additional targets that are not yet fully investigated in the TNBC, such as Gamma-Glutamyl Hydrolase (GGH), Thymidylate Synthetase (TYMS), Protein Tyrosine Kinase 6 (PTK6), Topoisomerase (DNA) I, Mitochondrial (TOP1MT), Smoothened, Frizzled Class Receptor (SMO), etc. Our additional analysis of target and drug selection strategy is also fully supported by the drug screening data on TNBC cell lines in the Cancer Cell Line Encyclopedia. CONCLUSIONS: The proposed bioinformatics approach lays a foundation for cancer precision medicine. It supplies much needed knowledge base for the off-label cancer drug usage in clinics.Item Comprehensive comparison of molecular portraits between cell lines and tumors in breast cancer(BioMed Central, 2016-08-22) Jiang, Guanglong; Zhang, Shijun; Yazdanparast, Aida; Li, Meng; Pawar, Aniruddha Vikram; Liu, Yunlong; Inavolu, Sai Mounika; Cheng, Lijun; Department of Medical and Molecular Genetics, IU School of MedicineBackground: Proper cell models for breast cancer primary tumors have long been the focal point in the cancer’s research. The genomic comparison between cell lines and tumors can investigate the similarity and dissimilarity and help to select right cell model to mimic tumor tissues to properly evaluate the drug reaction in vitro. In this paper, a comprehensive comparison in copy number variation (CNV), mutation, mRNA expression and protein expression between 68 breast cancer cell lines and 1375 primary breast tumors is conducted and presented. Results: Using whole genome expression arrays, strong correlations were observed between cells and tumors. PAM50 gene expression differentiated them into four major breast cancer subtypes: Luminal A and B, HER2amp, and Basal-like in both cells and tumors partially. Genomic CNVs patterns were observed between tumors and cells across chromosomes in general. High C > T and C > G trans-version rates were observed in both cells and tumors, while the cells had slightly higher somatic mutation rates than tumors. Clustering analysis on protein expression data can reasonably recover the breast cancer subtypes in cell lines and tumors. Although the drug-targeted proteins ER/PR and interesting mTOR/GSK3/TS2/PDK1/ER_P118 cluster had shown the consistent patterns between cells and tumor, low protein-based correlations were observed between cells and tumors. The expression consistency of mRNA verse protein between cell line and tumors reaches 0.7076. These important drug targets in breast cancer, ESR1, PGR, HER2, EGFR and AR have a high similarity in mRNA and protein variation in both tumors and cell lines. GATA3 and RP56KB1 are two promising drug targets for breast cancer. A total score developed from the four correlations among four molecular profiles suggests that cell lines, BT483, T47D and MDAMB453 have the highest similarity with tumors. Conclusions: The integrated data from across these multiple platforms demonstrates the existence of the similarity and dissimilarity of molecular features between breast cancer tumors and cell lines. The cell lines only mirror some but not all of the molecular properties of primary tumors. The study results add more evidence in selecting cell line models for breast cancer research.Item Distinct molecular pathways in ovarian endometrioid adenocarcinoma with concurrent endometriosis(Wiley, 2018) Zhang, Chi; Wang, Xiyin; Anaya, Yanett; Parodi, Luca; Cheng, Lijun; Anderson, Matthew L.; Hawkins, Shannon M.; Medicine, School of MedicineWomen with endometriosis, a benign growth of endometrial tissue outside the uterine cavity, are at increased risk of specific histotypes of epithelial ovarian cancer, such as ovarian endometrioid adenocarcinoma (OEA). Women with OEA who have endometriosis at time of surgical staging demonstrate improved clinical prognosis compared to women with OEA without evidence of endometriosis. However, the molecular contributions of the endometriotic tumor microenvironment to these ovarian cancers remain poorly understood. As a starting point, we used a platform for genome‐wide transcriptomic profiling to compare specimens of OEA from women with and without concurrent endometriosis and benign reproductive tract tissues, including proliferative endometrium and typical and atypical endometrioma samples (n = 20). Principle component analysis revealed distinct clustering between benign and malignant samples as well as malignant samples with and without concurrent endometriosis. Examination of gene signatures revealed that OEA with concurrent endometriosis contained a unique molecular signature compared to OEA without concurrent endometriosis, distinguished by 682 unique genes differentially expressed (fold change < or >1.5, p < 0.01). Bioinformatic analysis of these differentially expressed gene products using ingenuity pathway analysis revealed activation of NFkB signaling, an inflammatory signaling pathway constitutively active in endometriosis. DAVID functional annotation clustering further revealed enrichment in RAS signaling as both cytoskeleton organization and GTPase regulator activity relied heavily on RAS protein signal transduction. Gene set enrichment analysis highlighted immune and inflammatory nodes involved in OEA with concurrent endometriosis. These observations provide novel resources for understanding molecular subtleties potentially involved in OEA within the context of the endometriotic tumor microenvironment.Item DSCN: Double-target selection guided by CRISPR screening and network(Public Library of Science, 2022-08-19) Liu, Enze; Wu, Xue; Wang, Lei; Huo, Yang; Wu, Huanmei; Li, Lang; Cheng, Lijun; Medicine, School of MedicineCancer is a complex disease with usually multiple disease mechanisms. Target combination is a better strategy than a single target in developing cancer therapies. However, target combinations are generally more difficult to be predicted. Current CRISPR-cas9 technology enables genome-wide screening for potential targets, but only a handful of genes have been screend as target combinations. Thus, an effective computational approach for selecting candidate target combinations is highly desirable. Selected target combinations also need to be translational between cell lines and cancer patients. We have therefore developed DSCN (double-target selection guided by CRISPR screening and network), a method that matches expression levels in patients and gene essentialities in cell lines through spectral-clustered protein-protein interaction (PPI) network. In DSCN, a sub-sampling approach is developed to model first-target knockdown and its impact on the PPI network, and it also facilitates the selection of a second target. Our analysis first demonstrated a high correlation of the DSCN sub-sampling-based gene knockdown model and its predicted differential gene expressions using observed gene expression in 22 pancreatic cell lines before and after MAP2K1 and MAP2K2 inhibition (R2 = 0.75). In DSCN algorithm, various scoring schemes were evaluated. The 'diffusion-path' method showed the most significant statistical power of differentialting known synthetic lethal (SL) versus non-SL gene pairs (P = 0.001) in pancreatic cancer. The superior performance of DSCN over existing network-based algorithms, such as OptiCon and VIPER, in the selection of target combinations is attributable to its ability to calculate combinations for any gene pairs, whereas other approaches focus on the combinations among optimized regulators in the network. DSCN's computational speed is also at least ten times fast than that of other methods. Finally, in applying DSCN to predict target combinations and drug combinations for individual samples (DSCNi), DSCNi showed high correlation between target combinations predicted and real synergistic combinations (P = 1e-5) in pancreatic cell lines. In summary, DSCN is a highly effective computational method for the selection of target combinations.Item Essentiality and Transcriptome-Enriched Pathway Scores Predict Drug-Combination Synergy(MDPI, 2020-09-07) Li, Jin; Huo, Yang; Wu, Xue; Liu, Enze; Zeng, Zhi; Tian, Zhen; Fan, Kunjie; Stover, Daniel; Cheng, Lijun; Li, Lang; Medicine, School of MedicineIn the prediction of the synergy of drug combinations, systems pharmacology models expand the scope of experiment screening and overcome the limitations of current computational models posed by their lack of mechanical interpretation and integration of gene essentiality. We therefore investigated the synergy of drug combinations for cancer therapies utilizing records in NCI ALMANAC, and we employed logistic regression to test the statistical significance of gene and pathway features in that interaction. We trained our predictive models using 43 NCI-60 cell lines, 165 KEGG pathways, and 114 drug pairs. Scores of drug-combination synergies showed a stronger correlation with pathway than gene features in overall trend analysis and a significant association with both genes and pathways in genome-wide association analyses. However, we observed little overlap of significant gene expressions and essentialities and no significant evidence that associated target and non-target genes and their pathways. We were able to validate four drug-combination pathways between two drug combinations, Nelarabine-Exemestane and Docetaxel-Vermurafenib, and two signaling pathways, PI3K-AKT and AMPK, in 16 cell lines. In conclusion, pathways significantly outperformed genes in predicting drug-combination synergy, and because they have very different mechanisms, gene expression and essentiality should be considered in combination rather than individually to improve this prediction.Item Global Nonlinear Kernel Prediction for Large Dataset with a Particle Swarm Optimized Interval Support Vector Regression(IEEE, 2015-10) Ding, Yongsheng; Cheng, Lijun; Pedrycz, Witold; Hao, Kuangrong; Department of Medical and Molecular Genetics, IU School of MedicineA new global nonlinear predictor with a particle swarm-optimized interval support vector regression (PSO-ISVR) is proposed to address three issues (viz., kernel selection, model optimization, kernel method speed) encountered when applying SVR in the presence of large data sets. The novel prediction model can reduce the SVR computing overhead by dividing input space and adaptively selecting the optimized kernel functions to obtain optimal SVR parameter by PSO. To quantify the quality of the predictor, its generalization performance and execution speed are investigated based on statistical learning theory. In addition, experiments using synthetic data as well as the stock volume weighted average price are reported to demonstrate the effectiveness of the developed models. The experimental results show that the proposed PSO-ISVR predictor can improve the computational efficiency and the overall prediction accuracy compared with the results produced by the SVR and other regression methods. The proposed PSO-ISVR provides an important tool for nonlinear regression analysis of big data.Item Identification of Alternatively-Activated Pathways between Primary Breast Cancer and Liver Metastatic Cancer Using Microarray Data(MDPI, 2019-09-25) Wang, Limei; Li, Jin; Liu, Enze; Kinnebrew, Garrett; Zhang, Xiaoli; Stover, Daniel; Huo, Yang; Zeng, Zhi; Jiang, Wanli; Cheng, Lijun; Feng, Weixing; Li, Lang; BioHealth Informatics, School of Informatics and ComputingAlternatively-activated pathways have been observed in biological experiments in cancer studies, but the concept had not been fully explored in computational cancer system biology. Therefore, an alternatively-activated pathway identification method was proposed and applied to primary breast cancer and breast cancer liver metastasis research using microarray data. Interestingly, the results show that cytokine-cytokine receptor interaction and calcium signaling were significantly enriched under both conditions. TGF beta signaling was found to be the hub in network topology analysis. In total, three types of alternatively-activated pathways were recognized. In the cytokine-cytokine receptor interaction pathway, four active alteration patterns in gene pairs were noticed. Thirteen cytokine-cytokine receptor pairs with inverse activity changes of both genes were verified by the literature. The second type was that some sub-pathways were active under only one condition. For the third type, nodes were significantly active in both conditions, but with different active genes. In the calcium signaling and TGF beta signaling pathways, node E2F5 and E2F4 were significantly active in primary breast cancer and metastasis, respectively. Overall, our study demonstrated the first time using microarray data to identify alternatively-activated pathways in breast cancer liver metastasis. The results showed that the proposed method was valid and effective, which could be helpful for future research for understanding the mechanism of breast cancer metastasis.Item Identification of Potential Serum Protein Biomarkers and Pathways for Pancreatic Cancer Cachexia Using an Aptamer-Based Discovery Platform(MDPI, 2020-12-15) Narasimhan, Ashok; Shahda, Safi; Kays, Joshua K.; Perkins, Susan M.; Cheng, Lijun; Schloss, Katheryn N. H.; Schloss, Daniel E. I.; Koniaris, Leonidas G.; Zimmers, Teresa A.; Surgery, School of MedicinePatients with pancreatic ductal adenocarcinoma (PDAC) suffer debilitating and deadly weight loss, known as cachexia. Development of therapies requires biomarkers to diagnose, and monitor cachexia; however, no such markers are in use. Via Somascan, we measured ~1300 plasma proteins in 30 patients with PDAC vs. 11 controls. We found 60 proteins specific to local PDAC, 46 to metastatic, and 67 to presence of >5% cancer weight loss (FC ≥ |1.5|, p ≤ 0.05). Six were common for cancer stage (Up: GDF15, TIMP1, IL1RL1; Down: CCL22, APP, CLEC1B). Four were common for local/cachexia (C1R, PRKCG, ELANE, SOST: all oppositely regulated) and four for metastatic/cachexia (SERPINA6, PDGFRA, PRSS2, PRSS1: all consistently changed), suggesting that stage and cachexia status might be molecularly separable. We found 71 proteins that correlated with cachexia severity via weight loss grade, weight loss, skeletal muscle index and radiodensity (r ≥ |0.50|, p ≤ 0.05), including some known cachexia mediators/markers (LEP, MSTN, ALB) as well as novel proteins (e.g., LYVE1, C7, F2). Pathway, correlation, and upstream regulator analyses identified known (e.g., IL6, proteosome, mitochondrial dysfunction) and novel (e.g., Wnt signaling, NK cells) mechanisms. Overall, this study affords a basis for validation and provides insights into the processes underpinning cancer cachexia.Item Identification of TMEM230 mutations in familial Parkinson's disease(Nature Research, 2016-07) Deng, Han-Xiang; Shi, Yong; Yang, Yi; Ahmeti, Kreshnik B.; Miller, Nimrod; Huang, Cao; Cheng, Lijun; Zhai, Hong; Deng, Sheng; Nuytemans, Karen; Corbett, Nicola J.; Kim, Myung Jong; Deng, Hao; Tang, Baisha; Yang, Ziquang; Xu, Yanming; Chen, Piao; Huang, Bo; Gao, Xiao-Ping; Song, Zhi; Liu, Zhenhua; Fecto, Faisal; Siddique, Nailah; Foroud, Tatiana; Jankovic, Joseph; Ghetti, Bernardino; Nicholson, Daniel A.; Krainc, Dimitri; Melen, Onur; Vance, Jeffery M.; Pericak-Vance, Margaret A.; Ma, Yong-Chao; Rajput, Ali H.; Siddique, Teepu; Medical and Molecular Genetics, School of MedicineParkinson's disease is the second most common neurodegenerative disorder without effective treatment. It is generally sporadic with unknown etiology. However, genetic studies of rare familial forms have led to the identification of mutations in several genes, which are linked to typical Parkinson's disease or parkinsonian disorders. The pathogenesis of Parkinson's disease remains largely elusive. Here we report a locus for autosomal dominant, clinically typical and Lewy body-confirmed Parkinson's disease on the short arm of chromosome 20 (20pter-p12) and identify TMEM230 as the disease-causing gene. We show that TMEM230 encodes a transmembrane protein of secretory/recycling vesicles, including synaptic vesicles in neurons. Disease-linked TMEM230 mutants impair synaptic vesicle trafficking. Our data provide genetic evidence that a mutant transmembrane protein of synaptic vesicles in neurons is etiologically linked to Parkinson's disease, with implications for understanding the pathogenic mechanism of Parkinson's disease and for developing rational therapies.