Zhou, YaoqiLiu, JiangangDunker, A. KeithChen, JakeUversky, Vladimir N.Liu, YunlongLi, Dan S.2011-08-232011-08-232011-08-23https://hdl.handle.net/1805/2636http://dx.doi.org/10.7912/C2/916Indiana University-Purdue University Indianapolis (IUPUI)This dissertation presents a body of research that attempts to tackle the ‘overfitting’ problem for gene signature and biomarker development in two different aspects (mechanistically and computationally). In achievement of a deeper understanding of cancer molecular mechanisms, this study presents new approaches to derive gene signatures for various biological phenotypes, including breast cancer, in the context of well-defined and mechanistically associated biological pathways. We identified the pattern of gene expression in the cell cycle pathway can indeed serve as a powerful biomarker for breast cancer prognosis. We further built a predictive model for prognosis based on the cell cycle gene signature, and found our model to be more accurate than the Amsterdam 70-gene signature when tested with multiple gene expression datasets generated from several patient populations. Aside from demonstrating the effectiveness of dimensionality reduction, phenotypic dissection, and prognostic or diagnostic prediction, this approach also provides an alternative to the current methodology of identifying gene expression markers that links to biological mechanism. This dissertation also presents the development of a novel feature selection algorithm called Predictive Power Estimate Analysis (PPEA) to computationally tackle on overfitting. The algorithm iteratively apply a two-way bootstrapping procedure to estimate predictive power of each individual gene, and make it possible to construct a predictive model from a much smaller set of genes with the highest predictive power. Using DrugMatrix™ rat liver data, we identified genomic biomarkers of hepatic specific injury for inflammation, cell death, and bile duct hyperplasia. We demonstrated that the signature genes were mechanistically related to the phenotype the signature intended to predict (e.g. 17 out of top 20 genes for inflammation selected by PPEA were members of NF-kB pathway, which is a key pre-inflammatory pathway for a xenobiotic response). The top 4 gene signature for BDH has been further validated by QPCR in a toxicology lab. This is important because our results suggest that the PPEA model not largely deters the over-fitting problem, but also has the capability to elucidate mechanism(s) of drug action and / or of toxicity.en-USmolecular profiling,breast cancer, toxicogenomics, feature selectionBiochemical markersGene expressionBreast -- CancerGenetic toxicologyPhenotypeMOLECULAR PROFILING IN BREAST CANCER AND TOXICOGENOMICS