IU Indianapolis ScholarWorks :: Browsing by Author "Wu, Cen"

Browsing by Author "Wu, Cen"

Now showing 1 - 9 of 9

Cervical Cancer Development: Implications of HPV16 E6E7-NFX1-123 Regulated Genes
(MDPI, 2021-12-08) Quist, Kevin M.; Solorzano, Isaiah; Wendel, Sebastian O.; Chintala, Sreenivasulu; Wu, Cen; Wallace, Nicholas A.; Katzenellenbogen, Rachel A.; Pediatrics, School of Medicine
High-risk human papillomavirus (HR HPV) causes nearly all cervical cancers, half of which are due to HPV type 16 (HPV16). HPV16 oncoprotein E6 (16E6) binds to NFX1-123, and dysregulates gene expression, but their clinical implications are unknown. Additionally, HPV16 E7's role has not been studied in concert with NFX1-123 and 16E6. HR HPVs express both oncogenes, and transformation requires their expression, so we sought to investigate the effect of E7 on gene expression. This study's goal was to define gene expression profiles across cervical precancer and cancer stages, identify genes correlating with disease progression, assess patient survival, and validate findings in cell models. We analyzed NCBI GEO datasets containing transcriptomic data linked with cervical cancer stage and utilized LASSO analysis to identify cancer-driving genes. Keratinocytes expressing 16E6 and 16E7 (16E6E7) and exogenous NFX1-123 were tested for LASSO-identified gene expression. Ten out of nineteen genes correlated with disease progression, including CEBPD, NOTCH1, and KRT16, and affected survival. 16E6E7 in keratinocytes increased CEBPD, KRT16, and SLPI, and decreased NOTCH1. Exogenous NFX1-123 in 16E6E7 keratinocytes resulted in significantly increased CEBPD and NOTCH1, and reduced SLPI. This work demonstrates the clinical relevance of CEBPD, NOTCH1, KRT16, and SLPI, and shows the regulatory effects of 16E6E7 and NFX1-123.
Identifying Gene–Environment Interactions With Robust Marginal Bayesian Variable Selection
(Frontiers Media, 2021-12-08) Lu, Xi; Fan, Kun; Ren, Jie; Wu, Cen; Biostatistics & Health Data Science, School of Medicine
In high-throughput genetics studies, an important aim is to identify gene-environment interactions associated with the clinical outcomes. Recently, multiple marginal penalization methods have been developed and shown to be effective in G×E studies. However, within the Bayesian framework, marginal variable selection has not received much attention. In this study, we propose a novel marginal Bayesian variable selection method for G×E studies. In particular, our marginal Bayesian method is robust to data contamination and outliers in the outcome variables. With the incorporation of spike-and-slab priors, we have implemented the Gibbs sampler based on Markov Chain Monte Carlo (MCMC). The proposed method outperforms a number of alternatives in extensive simulation studies. The utility of the marginal robust Bayesian variable selection method has been further demonstrated in the case studies using data from the Nurse Health Study (NHS). Some of the identified main and interaction effects from the real data analysis have important biological implications.
Interep: An R Package for High-Dimensional Interaction Analysis of the Repeated Measurement Data
(MDPI, 2022-03-19) Zhou, Fei; Ren, Jie; Liu, Yuwen; Li, Xiaoxi; Wang, Weiqun; Wu, Cen; Biostatistics and Health Data Science, School of Medicine
We introduce interep, an R package for interaction analysis of repeated measurement data with high-dimensional main and interaction effects. In G × E interaction studies, the forms of environmental factors play a critical role in determining how structured sparsity should be imposed in the high-dimensional scenario to identify important effects. Zhou et al. (2019) (PMID: 31816972) proposed a longitudinal penalization method to select main and interaction effects corresponding to the individual and group structure, respectively, which requires a mixture of individual and group level penalties. The R package interep implements generalized estimating equation (GEE)-based penalization methods with this sparsity assumption. Moreover, alternative methods have also been implemented in the package. These alternative methods merely select effects on an individual level and ignore the group-level interaction structure. In this software article, we first introduce the statistical methodology corresponding to the penalized GEE methods implemented in the package. Next, we present the usage of the core and supporting functions, which is followed by a simulation example with R codes and annotations. The R package interep is available at The Comprehensive R Archive Network (CRAN).
Is Seeing Believing? A Practitioner’s Perspective on High-Dimensional Statistical Inference in Cancer Genomics Studies
(MDPI, 2024-09-16) Fan, Kun; Subedi, Srijana; Yang, Gongshun; Lu, Xi; Ren, Jie; Wu, Cen; Biostatistics and Health Data Science, Richard M. Fairbanks School of Public Health
Variable selection methods have been extensively developed for and applied to cancer genomics data to identify important omics features associated with complex disease traits, including cancer outcomes. However, the reliability and reproducibility of the findings are in question if valid inferential procedures are not available to quantify the uncertainty of the findings. In this article, we provide a gentle but systematic review of high-dimensional frequentist and Bayesian inferential tools under sparse models which can yield uncertainty quantification measures, including confidence (or Bayesian credible) intervals, p values and false discovery rates (FDR). Connections in high-dimensional inferences between the two realms have been fully exploited under the "unpenalized loss function + penalty term" formulation for regularization methods and the "likelihood function × shrinkage prior" framework for regularized Bayesian analysis. In particular, we advocate for robust Bayesian variable selection in cancer genomics studies due to its ability to accommodate disease heterogeneity in the form of heavy-tailed errors and structured sparsity while providing valid statistical inference. The numerical results show that robust Bayesian analysis incorporating exact sparsity has yielded not only superior estimation and identification results but also valid Bayesian credible intervals under nominal coverage probabilities compared with alternative methods, especially in the presence of heavy-tailed model errors and outliers.
Robust Bayesian variable selection for gene-environment interactions
(Wiley, 2022-06) Ren, Jie; Zhou, Fei; Li, Xiaoxi; Ma, Shuangge; Jiang, Yu; Wu, Cen; Biostatistics and Health Data Science, School of Medicine
Gene–environment (G× E) interactions have important implications to elucidate the etiology of complex diseases beyond the main genetic and environmental effects. Outliers and data contamination in disease phenotypes of G× E studies have been commonly encountered, leading to the development of a broad spectrum of robust regularization methods. Nevertheless, within the Bayesian framework, the issue has not been taken care of in existing studies. We develop a fully Bayesian robust variable selection method for G× E interaction studies. The proposed Bayesian method can effectively accommodate heavy-tailed errors and outliers in the response variable while conducting variable selection by accounting for structural sparsity. In particular, for the robust sparse group selection, the spike-and-slab priors have been imposed on both individual and group levels to identify important main and interaction effects robustly. An efficient Gibbs sampler has been developed to facilitate fast computation. Extensive simulation studies, analysis of diabetes data with single-nucleotide polymorphism measurements from the Nurses' Health Study, and The Cancer Genome Atlas melanoma data with gene expression measurements demonstrate the superior performance of the proposed method over multiple competing alternatives.
Robust Bayesian variable selection for gene–environment interactions
(Oxford University Press, 2023) Ren, Jie; Zhou, Fei; Li, Xiaoxi; Ma, Shuangge; Jiang, Yu; Wu, Cen; Biostatistics and Health Data Science, Richard M. Fairbanks School of Public Health
Gene-environment (G× E) interactions have important implications to elucidate the etiology of complex diseases beyond the main genetic and environmental effects. Outliers and data contamination in disease phenotypes of G× E studies have been commonly encountered, leading to the development of a broad spectrum of robust regularization methods. Nevertheless, within the Bayesian framework, the issue has not been taken care of in existing studies. We develop a fully Bayesian robust variable selection method for G× E interaction studies. The proposed Bayesian method can effectively accommodate heavy-tailed errors and outliers in the response variable while conducting variable selection by accounting for structural sparsity. In particular, for the robust sparse group selection, the spike-and-slab priors have been imposed on both individual and group levels to identify important main and interaction effects robustly. An efficient Gibbs sampler has been developed to facilitate fast computation. Extensive simulation studies, analysis of diabetes data with single-nucleotide polymorphism measurements from the Nurses' Health Study, and The Cancer Genome Atlas melanoma data with gene expression measurements demonstrate the superior performance of the proposed method over multiple competing alternatives.
Sparse group variable selection for gene-environment interactions in the longitudinal study
(Wiley, 2022) Zhou, Fei; Lu, Xi; Ren, Jie; Fan, Kun; Ma, Shuangge; Wu, Cen; Biostatistics and Health Data Science, School of Medicine
Penalized variable selection for high dimensional longitudinal data has received much attention as it can account for the correlation among repeated measurements while providing additional and essential information for improved identification and prediction performance. Despite the success, in longitudinal studies, the potential of penalization methods is far from fully understood for accommodating structured sparsity. In this article, we develop a sparse group penalization method to conduct the bi-level gene-environment (G×E) interaction study under the repeatedly measured phenotype. Within the quadratic inference function (QIF) framework, the proposed method can achieve simultaneous identification of main and interaction effects on both the group and individual level. Simulation studies have shown that the proposed method outperforms major competitors. In the case study of asthma data from the Childhood Asthma Management Program (CAMP), we conduct G×E study by using high dimensional SNP data as genetic factors and the longitudinal trait, forced expiratory volume in one second (FEV1), as the phenotype. Our method leads to improved prediction and identification of main and interaction effects with important implications.
Springer: An R package for bi-level variable selection of high-dimensional longitudinal data
(Frontiers Media, 2023-04-06) Zhou, Fei; Liu, Yuwen; Ren, Jie; Wang, Weiqun; Wu, Cen; Biostatistics and Health Data Science, School of Medicine
In high-dimensional data analysis, the bi-level (or the sparse group) variable selection can simultaneously conduct penalization on the group level and within groups, which has been developed for continuous, binary, and survival responses in the literature. Zhou et al. (2022) (PMID: 35766061) has further extended it under the longitudinal response by proposing a quadratic inference function-based penalization method in gene–environment interaction studies. This study introduces “springer,” an R package implementing the bi-level variable selection within the QIF framework developed in Zhou et al. (2022). In addition, R package “springer” has also implemented the generalized estimating equation-based sparse group penalization method. Alternative methods focusing only on the group level or individual level have also been provided by the package. In this study, we have systematically introduced the longitudinal penalization methods implemented in the “springer” package. We demonstrate the usage of the core and supporting functions, which is followed by the numerical examples and discussions. R package “springer” is available at https://cran.r-project.org/package=springer.
The Bayesian Regularized Quantile Varying Coefficient Model
(Elsevier, 2023) Zhou, Fei; Ren, Jie; Ma, Shuangge; Wu, Cen; Biostatistics and Health Data Science, Richard M. Fairbanks School of Public Health
The quantile varying coefficient (VC) model can flexibly capture dynamical patterns of regression coefficients. In addition, due to the quantile check loss function, it is robust against outliers and heavy-tailed distributions of the response variable, and can provide a more comprehensive picture of modeling via exploring the conditional quantiles of the response variable. Although extensive studies have been conducted to examine variable selection for the high-dimensional quantile varying coefficient models, the Bayesian analysis has been rarely developed. The Bayesian regularized quantile varying coefficient model has been proposed to incorporate robustness against data heterogeneity while accommodating the non-linear interactions between the effect modifier and predictors. Selecting important varying coefficients can be achieved through Bayesian variable selection. Incorporating the multivariate spike-and-slab priors further improves performance by inducing exact sparsity. The Gibbs sampler has been derived to conduct efficient posterior inference of the sparse Bayesian quantile VC model through Markov chain Monte Carlo (MCMC). The merit of the proposed model in selection and estimation accuracy over the alternatives has been systematically investigated in simulation under specific quantile levels and multiple heavy-tailed model errors. In the case study, the proposed model leads to identification of biologically sensible markers in a non-linear gene-environment interaction study using the NHS data.

Browsing by Author "Wu, Cen"

Results Per Page

Sort Options