IU Indianapolis ScholarWorks :: Browsing by Author "Chen, Jake"

Browsing by Author "Chen, Jake"

Now showing 1 - 6 of 6

A neural network approach to multi-biomarker panel discovery by high-throughput plasma proteomics profiling of breast cancer
(Springer Nature, 2013) Zhang, Fan; Chen, Jake; Wang, Mu; Drabier, Renee; Computer and Information Science, Purdue School of Science
Background: In the past several years, there has been increasing interest and enthusiasm in molecular biomarkers as tools for early detection of cancer. Liquid chromatography tandem mass spectrometry (LC/MS/MS) based plasma proteomics profiling technique is a promising technology platform to study candidate protein biomarkers for early detection of cancer. Factors such as inherent variability, protein detectability limitation, and peptide discovery biases among LC/MS/MS platforms have made the classification and prediction of proteomics profiles challenging. Developing proteomics data analysis methods to identify multi-protein biomarker panels for breast cancer diagnosis based on neural networks provides hope for improving both the sensitivity and the specificity of candidate cancer biomarkers for early detection. Results: In our previous method, we developed a Feed Forward Neural Network-based method to build the classifier for plasma samples of breast cancer and then applied the classifier to predict blind dataset of breast cancer. However, the optimal combination C* in our previous method was actually determined by applying the trained FFNN on the testing set with the combination. Therefore, in this paper, we applied a three way data split to the Feed Forward Neural Network for training, validation and testing based. We found that the prediction performance of the FFNN model based on the three way data split outperforms our previous method and the prediction performance is improved from (AUC = 0.8706, precision = 82.5%, accuracy = 82.5%, sensitivity = 82.5%, specificity = 82.5% for the testing set) to (AUC = 0.895, precision = 86.84%, accuracy = 85%, sensitivity = 82.5%, specificity = 87.5% for the testing set). Conclusions: Further pathway analysis showed that the top three five-marker panels are associated with complement and coagulation cascades, signaling, activation, and hemostasis, which are consistent with previous findings. We believe the new approach is a better solution for multi-biomarker panel discovery and it can be applied to other clinical proteomics.
COMPARATIVE ANALYSIS OF THE DISCORDANCE BETWEEN THE GLOBAL TRANSCRIPTIONAL AND PROTEOMIC RESPONSE OF THE YEAST SACCHAROMYCES CEREVISIAE TO DELETION OF THE F-BOX PROTEIN, GRR1
(2010-05) Heyen, Joshua William; Goebl, Mark, 1958-; Roach, Peter J.; Clemmer, David E.; Wang, Mu; Chen, Jake
The Grr1 (Glucose Repression Resistant) protein in Saccharomyces cerevisiae is an F-box protein for the E3 ubiquitin ligase protein complex known as the SCFGrr1 (Skp, Cullin, F-box). F-box proteins serve as substrate receptors for this complex and in this capacity Grr1 serves to promote the ubiquitylation and subsequent proteasomal degradation of a number of intracellular protein substrates. Substrates of SCFGrr1 include the G1-S phase cyclins, Cln1 and Cln2, the Cdc42 effectors and cell polarity proteins, Gic1 and Gic2, the FCH-bar domain protein, Hof1, required for cytokinesis, the meiosis activating serine/threonine protein kinase, Ime2, the transcriptional regulators of glucose transporters, Mth1 and Std1, and the mitochondrial retrograde response inhibitor Mks1. Stabilization of these substrates lead to pleiotrophic phenotypic defects in grr1Δ strains including resistance to glucose repression, accumulation of grr1Δ cells in G2 and M phase of the cell cycle, sensitivity to osmotic stress, and resistance to divalent cations. However, many of these phenotypes are not reflected at the gene expression level. We conducted a quantitative genomic vii and proteomic comparison of 914 loci in a grr1Δ and wild-type strain grown to early log-phase in glucose media. These loci encompassed 16.7% of the Saccharomyces proteome of which 22.3% exhibited discordance between gene and protein expression. GO process enrichment analysis revealed that discordant loci were enriched in the processes of “trafficking”, “mitosis”, and “carbon/energy” metabolism. Here we show that these instances of discordance are biologically relevant and in fact reflect phenotypes of grr1Δ strains not evident at the transcriptional level. Additionally, through combined biochemical and network analysis of discordant loci among “carbon and energy metabolism” we were able to not only construct a model for central carbon metabolism in grr1Δ strains but also were able to elucidate a novel molecular event that may serve to regulate glucose repression of genes needed for respiration in response to changes in glucose concentration.
Identification of Publications on Disordered Proteins from PubMed
(2012-08-07) Sirisha, Peyyeti; Xia, Yuni; Dunker, A. Keith; Chen, Jake
The literature corresponding to disordered proteins has been on a rise. As the number of publications increase, the time and effort needed to manually identify the relevant publications and protein information to add to centralized repository (called DisProt) is becoming arduous and critical. Existing search facilities on PubMed can retrieve a seemingly large number of publications based on keywords and does not have any support for ranking them based on the probability of the protein names mentioned in a given abstract being added to DisProt. This thesis explores a novel system of using disorder predictors and context based dictionary methods to quickly identify publications on disordered proteins from the PubMed database. NLProt, which is built around Support Vector Machines, is used to identify protein names and PONDR-FIT which is an Artificial Neural Network based meta- predictor is used for identifying protein disorder. The work done in this thesis is of immediate significance in identifying disordered protein names. We have tested the new system on 100 abstracts from DisProt [these abstracts were found to be relevant to disordered proteins and were added to DisProt manually by the annotators.] This system had an accuracy of 87% on this test set. We then took another 100 recently added abstracts from PubMed and ran our algorithm on them. This time it had an accuracy of 68%. We suggested improvements to increase the accuracy and believe that this system can be applied for identifying disordered proteins from literature.
MOLECULAR PROFILING IN BREAST CANCER AND TOXICOGENOMICS
(2011-08-23) Liu, Jiangang; Zhou, Yaoqi; Dunker, A. Keith; Chen, Jake; Uversky, Vladimir N.; Liu, Yunlong; Li, Dan S.
This dissertation presents a body of research that attempts to tackle the ‘overfitting’ problem for gene signature and biomarker development in two different aspects (mechanistically and computationally). In achievement of a deeper understanding of cancer molecular mechanisms, this study presents new approaches to derive gene signatures for various biological phenotypes, including breast cancer, in the context of well-defined and mechanistically associated biological pathways. We identified the pattern of gene expression in the cell cycle pathway can indeed serve as a powerful biomarker for breast cancer prognosis. We further built a predictive model for prognosis based on the cell cycle gene signature, and found our model to be more accurate than the Amsterdam 70-gene signature when tested with multiple gene expression datasets generated from several patient populations. Aside from demonstrating the effectiveness of dimensionality reduction, phenotypic dissection, and prognostic or diagnostic prediction, this approach also provides an alternative to the current methodology of identifying gene expression markers that links to biological mechanism. This dissertation also presents the development of a novel feature selection algorithm called Predictive Power Estimate Analysis (PPEA) to computationally tackle on overfitting. The algorithm iteratively apply a two-way bootstrapping procedure to estimate predictive power of each individual gene, and make it possible to construct a predictive model from a much smaller set of genes with the highest predictive power. Using DrugMatrix™ rat liver data, we identified genomic biomarkers of hepatic specific injury for inflammation, cell death, and bile duct hyperplasia. We demonstrated that the signature genes were mechanistically related to the phenotype the signature intended to predict (e.g. 17 out of top 20 genes for inflammation selected by PPEA were members of NF-kB pathway, which is a key pre-inflammatory pathway for a xenobiotic response). The top 4 gene signature for BDH has been further validated by QPCR in a toxicology lab. This is important because our results suggest that the PPEA model not largely deters the over-fitting problem, but also has the capability to elucidate mechanism(s) of drug action and / or of toxicity.
Optimizing hydropathy scale to improve IDP prediction and characterizing IDPs' functions
(2014-01) Huang, Fei; Dunker, A. Keith; Chen, Jake; Hurley, Thomas D., 1961-; Shen, Li
Intrinsically disordered proteins (IDPs) are flexible proteins without defined 3D structures. Studies show that IDPs are abundant in nature and actively involved in numerous biological processes. Two crucial subjects in the study of IDPs lie in analyzing IDPs’ functions and identifying them. We thus carried out three projects to better understand IDPs. In the 1st project, we propose a method that separates IDPs into different function groups. We used the approach of CH-CDF plot, which is based the combined use of two predictors and subclassifies proteins into 4 groups: structured, mixed, disordered, and rare. Studies show different structural biases for each group. The mixed class has more order-promoting residues and more ordered regions than the disordered class. In addition, the disordered class is highly active in mitosis-related processes among others. Meanwhile, the mixed class is highly associated with signaling pathways, where having both ordered and disordered regions could possibly be important. The 2nd project is about identifying if an unknown protein is entirely disordered. One of the earliest predictors for this purpose, the charge-hydropathy plot (C-H plot), exploited the charge and hydropathy features of the protein. Not only is this algorithm simple yet powerful, its input parameters, charge and hydropathy, are informative and readily interpretable. We found that using different hydropathy scales significantly affects the prediction accuracy. Therefore, we sought to identify a new hydropathy scale that optimizes the prediction. This new scale achieves an accuracy of 91%, a significant improvement over the original 79%. In our 3rd project, we developed a per-residue C-H IDP predictor, in which three hydropathy scales are optimized individually. This is to account for the amino acid composition differences in three regions of a protein sequence (N, C terminus and internal). We then combined them into a single per-residue predictor that achieves an accuracy of 74% for per-residue predictions for proteins containing long IDP regions.
System biology modeling : the insights for computational drug discovery
(2014) Huang, Hui; Chen, Jake; Wu, Huanmei; Al Hasan, Mohammad; Liu, Yunlong; Zhou, Yaoqi
Traditional treatment strategy development for diseases involves the identification of target proteins related to disease states, and the interference of these proteins with drug molecules. Computational drug discovery and virtual screening from thousands of chemical compounds have accelerated this process. The thesis presents a comprehensive framework of computational drug discovery using system biology approaches. The thesis mainly consists of two parts: disease biomarker identification and disease treatment discoveries. The first part of the thesis focuses on the research in biomarker identification for human diseases in the post-genomic era with an emphasis in system biology approaches such as using the protein interaction networks. There are two major types of biomarkers: Diagnostic Biomarker is expected to detect a given type of disease in an individual with both high sensitivity and specificity; Predictive Biomarker serves to predict drug response before treatment is started. Both are essential before we even start seeking any treatment for the patients. In this part, we first studied how the coverage of the disease genes, the protein interaction quality, and gene ranking strategies can affect the identification of disease genes. Second, we addressed the challenge of constructing a central database to collect the system level data such as protein interaction, pathway, etc. Finally, we built case studies for biomarker identification for using dabetes as a case study. The second part of the thesis mainly addresses how to find treatments after disease identification. It specifically focuses on computational drug repositioning due to its low lost, few translational issues and other benefits. First, we described how to implement literature mining approaches to build the disease-protein-drug connectivity map and demonstrated its superior performances compared to other existing applications. Second, we presented a valuable drug-protein directionality database which filled the research gap of lacking alternatives for the experimental CMAP in computational drug discovery field. We also extended the correlation based ranking algorithms by including the underlying topology among proteins. Finally, we demonstrated how to study drug repositioning beyond genomic level and from one dimension to two dimensions with clinical side effect as prediction features.

Browsing by Author "Chen, Jake"

Results Per Page

Sort Options