Biostatistics Department Theses and Dissertations

Permanent URI for this collection

Browse

Recent Submissions

Now showing 1 - 10 of 65
  • Item
    A Bayesian Design For Platform Trials With Temporal Changes
    (2025-05) Zhang, Chen; Zang, Yong; Fadel, William F.; Zhang, Pengyue
    The platform trial, which aims to find the best treatment for a disease by sequentially investigating multiple treatments in a single trial, has become increasingly popular in recent decades. An inherent problem for a platform trial is how to borrow information from the non-current controls to improve the efficiency of the statistical inference. The practical solution of directly combining all the control patients does not work due to the population heterogeneity between the concurrent and non-current controls. The temporal changes are the significant resources for that heterogeneity, which will affect patients’ responses over time. In this paper, we develop a Bayesian design to evaluate treatment effects of platform trials accounting for temporal changes. We treat each cohort of patients as a matching set and develop a conditional likelihood method to eliminate the impact of temporal changes. The performance of the proposed method is evaluated through simulation studies.
  • Item
    Transcriptomic Analysis of Survival of Pulmonary Arterial Hypertension Patients
    (2025-05) Gomez Aleman, Adrian; Liu, Yunlong; Schwantes-An , Tae-Hwi Linus; Fadel, William; Reiter, Jill
    Pulmonary arterial hypertension (PAH) is a rare and often fatal condition characterized by obliterative PA remodeling, inflammation, and metabolic reprogramming leading to increased pulmonary vascular resistance (PVR) and right heart failure. To elucidate the genetic causes for disease risk, progression, and outcomes in PAH, many genetic studies, including genome-wide association studies (GWAS), have been conducted. These efforts culminated in identifying both rare and common genetic variants that alter the risk for developing PAH. However, the genetic underpinning of outcomes in PAH remains largely unidentified. To address this crucial gap in developing treatments for PAH, we sought to leverage available data to identify transcriptomic signatures that stratify the hazard for death among patients with PAH, affecting all-cause mortality using the PAH Biobank, which included over 1,000 patients with PAH from diverse genetic ancestry groups. Using available whole-blood RNA-Seq data, we conducted a survival analysis for all-cause mortality or transplant stratified by genetic ancestry groups using the Cox proportional hazards model. RNA-Seq data were quantified using SALMON and normalized using the DESeq2 package in R. Both normalized and tertile gene expression levels were tested for association with survival while adjusting for age at diagnosis, sex, type of PAH, PVR, neutrophils, and the 5 principal components in the survival analysis. A two-stage analysis with EUR as the discovery cohort and AFR and AMR as two independent replication cohorts was performed. A Bonferroni correction was applied to adjust for the number of discovery tests conducted. In total, there were 848 EUR (European genetic ancestry), 81 AFR (African genetic ancestry), and 103 AMR (Admixed American genetic ancestry) participants for analyses. In the discovery cohort, 45,915 genes were tested, and 8 genes were statistically significantly associated with the hazard. Three gene associations (REXO2, FHL2, and CABP4) were replicated (p-value < 0.05 with an exact direction of effect on hazard) in both replication cohorts (AFR, AMR). Using one of the largest cohorts of patients with PAH, we identified three genes that are significantly associated with all-cause mortality across populations. These genes represent potential targets for therapeutic developments as well as for understanding the biological underpinning of progression in PAH.
  • Item
    Comparing Nanopore to MethylationEPIC Array and EM-Seq in DNA Methylation Detection
    (2024-12) Brooks, Steven; Liu, Yunlong; Peng, Gang; Zhang, Pengyue
    DNA Methylation is an important biological process in epigenetics, and many methods have been developed to profile DNA methylation. Recently a growing number of studies use Nanopore long-read sequencing technology in DNA methylation detection, in contrast to widely used Infinium arrays and short-read whole genome sequencing (WGS) methods. In this study, we evaluate the performance of Nanopore sequencing in DNA methylation detection by comparing it to the Illumina MethylationEPIC microarray (EPIC) and Enzymatic Methyl-Sequencing. We first compare Oxford Nanopore Technologies’ Nanopore with MethylationEPIC array. Among the ~850,000 CpG sites covered by both methods, we observed high concordance (R ≥ 0.94 across all four samples). After downsampling Nanopore data from an average coverage of 26.6 reads per site to 10 reads per site, the correlation in CpG methylation remained high (R≥ 0.935). Next, we compare Nanopore with EM-Seq in the context of low coverage. The lower CpG methylation correlation (R ≥ 0.8), can be attributed to reduced coverage of hypomethylated CpG sites by EM-Seq. Furthermore, we highlight Nanopore’s unique capabilities, including native DNA sequencing that can differentiate modification types and the use of long reads for haplotype phasing. Overall, Nanopore demonstrated high concordance with the EPIC array and more uniform coverage across the genome than EM-Seq. This study provides insights for researchers in selecting appropriate DNA methylation detection methods, considering factors such as cost, DNA input, and the complexity of downstream analysis.
  • Item
    Bayesian Adaptive Designs for Phase II Clinical Trials Evaluating Subgroup-Specific Treatment Effect
    (2024-12) Shan, Mu; Zang, Yong; Han, Jiali; Tu, Wanzhu; Zhang, Pengyue
    In Phase II clinical trials, particularly for molecularly targeted agents (MTAs) and biotherapies, there is a critical need to evaluate subgroup-specific treatment effects due to the heterogeneous nature of these therapies. This dissertation introduces two innovative Bayesian adaptive designs for biomarker-guided clinical trials: the Bayesian Order Constrained Adaptive (BOCA) design and the Bayesian Adaptive Marker-Stratified Design Using Calibrated Spike-and-Slab priors (SSS). The BOCA design addresses the limitations of the "one-size-fits-all" approach in non-randomized Phase II trials by efficiently detecting subgroup-specific treatment effects. It combines elements of enrichment and sequential designs, starting with an "all-comers" stage and transitioning to an enrichment stage based on interim analysis results. The decision to continue with either the marker-positive or marker-negative subgroup is guided by two posterior probabilities utilizing inherent ordering constraints. This adaptive approach enhances trial efficiency and cost-effectiveness while managing missing biomarker data. Comprehensive simulation studies show that the BOCA design outperforms conventional designs in detecting subgroup-specific treatment effects, making it a robust tool for Phase II trials. The SSS design improves the efficiency of marker-stratified designs (MSD) by leveraging clinical features of biomarkers and treatments. Patients are classified into marker-positive and marker-negative subgroups and randomized to receive either the MTA or a control treatment. The SSS design uses spike-and-slab priors to dynamically share information on response rates across subgroups, governed by two posterior probabilities that assess similarities in response rates. Additionally, it incorporates a Bayesian multiple imputation method to address missing biomarker profiles. Simulation studies confirm that the SSS design exhibits favorable operational characteristics, surpassing conventional designs in evaluating subgroup-specific treatment effects. Both the BOCA and SSS designs represent significant advancements in Bayesian adaptive methodologies for Phase II trials. By addressing traditional approach limitations, these designs enhance the evaluation of subgroup-specific treatment effects, contributing valuable methodologies to the field of personalized medicine.
  • Item
    Statistical Deep Learning of Multivariate Longitudinal Data
    (2024-11) Li, Yunyi; Gao, Sujuan; Liu, Hao; Apostolova, Liana G.; Li, Xiaochun; Zhao, Yi
    Nowadays, various types of longitudinal data, including continuous, binary, and count data, are increasingly collected in numerous scientific research fields such as Alzheimer’s disease studies. Despite the wealth of data, the complex structure of multivariate longitudinal data presents significant modeling challenges. For years, scientific research has been actively exploring dynamic interactions among multiple components and understanding how interventions can impact outcomes over time with complex underlying dynamics. However, statistical methods for modeling these dynamic changes and associations are still limited. To address these gaps, we propose a novel nonparametric method to describe the mean temporal changes of sparsely and irregularly observed multivariate longitudinal data. This method is based on an Ordinary Differential Equation (ODE) system approximated by neural networks. Furthermore, we presented a novel approach to treat the initial values of ODEs as an unknown parameter vector, a departure from existing methods that either pre-specify the initial values or estimate them in an ad hoc manner. In the second topic, we propose deep latent ODE models. These models nonparametrically model latent temporal trends by an unknown function of an ODE system and parametrically estimate the effects of covariates using Bayesian approaches. To address the intractability of the posterior distribution of initial values, we employ a variational autoencoder (VAE) algorithm. The approximate posterior distribution is characterized by a recurrent neural network (RNN), and high dimensional hy-perparameters are estimated using the stochastic gradient descent method based on Kullback-Leibler (KL) divergence. Lastly, we propose Bayesian generalized random effects models for modeling longitudinal data from various distributions, including longitudinal counts, and longitudinal binary outcomes. This model extends traditional generalized linear mixed effect models (GLMMs) to generalized semi-parametric mixed effect models. It assumes a nonparametric baseline function with a stochastic process prior, and parameters are estimated using the Bayesian approach. The proposed model is practical and can be applied to various types of longitudinal data, including longitudinal binary, and count data. Neural ODE, RNN, variational inference, and KL divergence techniques are also applied in this project.
  • Item
    Identify Signature Genes/Pathways to Characterize Alzheimer's Disease Subtypes Based on Uncoupled Tauopathies and Cognitive Decline
    (2024-06) Huang, Xiaoqing; Huang, Kun; Zhang, Jie; Johnson, Travis; Zhang, Jianjun
    Alzheimer's disease (AD) is a slow-progressing dementia usually found in elderlies, with heterogeneous clinical phenotypes and possible underlying mechanisms. Widely spread tauopathy is one of the pathological change hallmarks in AD brains, in which microtube protein tau forms scar-like neurofibrillary tangles that kill neurons. However, subgroups of patients present unmatched tauopathy progression with their cognitive decline. A detailed study on these so-called atypical AD patients allows for a deeper understanding of possible various disease mechanisms and the factors contributing to disease vulnerability or resilience, which can help guide the drug development and treatment strategy tailored to different subgroups, as well as establish foundations for disease prevention. By identifying specific molecular biomarkers associated with each subtype, I hope to help clinicians diagnose various AD subtypes at an earlier stage. In this work, I have performed transcriptomic and proteomic characterization of two atypical AD subtypes on two large AD/normal brain cohorts to further understand the role of tauopathy in the AD etiology, identified several pathways that are associated with the two phenotypes’ AD-resilient and AD-vulnerable characteristics, and tried to identify the potential drug targets for the precision treatment of AD using extensive bioinformatic approaches. In the meanwhile, two methodologies were developed and applied. One is a new type of interpretable deep learning model (ParsVNN) coupled with the neural network architecture with the hierarchical structure of the gene/protein pathways is introduced and leveraged to address the complexity and improve the interpretability by making its biological hierarchy simple and specific to the predicted subgroup. The other is a label transferring approach using optimal transport from brain samples to blood samples in the hope of finding serum biomarkers for atypical AD groups in live patients and predicting their disease progression in a non-invasive fashion. Conclusively, the study improves our understanding of AD etiology and leads to more personalized care and disease prevention. It acknowledges the complexity of the disease and aims to uncover mechanistic distinctions within the broad Alzheimer’s disease spectrum.
  • Item
    Modified 3+3 Design for MTD Re-estimation
    (2024-06) Zhang, Tianshu; Zang, Yong; Han, Yan; Liu, Ziyue
    The 3+3 clinical trial design is one of the most popular dose-finding designs used in phase I oncology trials to identify the maximum tolerated dose (MTD) for new treatment regimens. While this design is widely used due to its simplicity , it has some notable limitations, including a maximum of six patients per dose level and fixed target toxicity rates. To address these issues, we propose a modified 3+3 design that extends the traditional 3+3 design by treating the remaining patients at the MTD level for additional dose-limiting toxicity (DLT) assessment. This modification allows for a more flexible and accurate way to identify the MTD, enhanced by the use of isotonic regression to calculate DLT rates. To compare the modified 3+3 designs and the traditional 3+3 design, computer simulation studies have been carried out under various dose-toxicity scenarios. The results show that the modified 3+3 design yields higher accuracy in MTD identification.
  • Item
    Transparent and Efficient Designs for Clinical Trials
    (2024-05) Qiu, Yingjie; Zhao, Yi; Zang, Yong; Perkins, Susan; Zhang, Pengyue; Yan, Jingwen
    Modern early phase clinical trials are integral in assessing the efficacy and safety of new treatments. Traditional methodologies heavily rely on complex parametric models to determine dose-response relationships. They come with inherent challenges: difficulty in practical validation, potential for poor performances if parametric assumptions are inaccurately defined, and a heavy learning burden for medical practitioners. The need for novel methods that bridge the gap between statistical robustness and clinical applicability is evident. To accommodate those issues, we proposed two transparent and efficient designs. The modified isotonic regression based phase I/II clinical trial design (mISO) and the utility-based model free phase I/II design (UFO) represent innovative strides in identifying optimal doses for clinical trials. The mISO design, eschewing traditional parametric assumptions, offers a transparent and efficient method, adaptable to various dose-response curves and enhanced by the mISO-B extension for delayed outcomes. In parallel, the UFO design, specifically tailored for immunotherapy trials, diverges from complex models to employ a dynamic, utility-based approach. This approach continuously updates with trial data, optimizing dose allocation for each patient cohort. Both designs have demonstrated superior performance in comprehensive simulation studies by comparing them with existing methods. Several sequential methods populate the statistical literature, but there remains a notable gap in addressing secondary objectives without altering the primary aim. Addressing this, a two-stage design for randomized controlled trials sequentially testing superiority and noninferiority introduces a novel two-stage group sequential strategy. This strategy primarily aims to establish the superiority of a treatment, assessed at both interim and final stages. Uniquely, it shifts to test noninferiority only if the superiority criterion is not met at the end of the second stage. This dual-focus approach is particularly appreciated in clinical settings for its practical application. Furthermore, it provides a valuable alternative in scenarios where achieving sufficient power for the superiority objective is hindered by limited participant recruitment, allowing the study to pivot towards demonstrating noninferiority.
  • Item
    Statistical Methods for Cancer Research
    (2024-01) Han, Yan; Zhao, Yi; Tu, Wanzhu; Li, Yang; Zhang, Jianjun
    Phase I/II clinical trial design is pivotal for achieving optimal therapeutic effect in immunotherapy and drug combination therapy for cancer treatment. Additionally, the identification of biomarkers associated with the risk of severe complications during cancer therapy is a crucial research area. This dissertation contains three related topics, which focus on adaptive Phase I/II clinical trial design and the identification of biomarkers relevant to cancer research. The first topic focuses on developing a two-stage nonparametric (TSNP) phase I/II clinical trial design to identify the optimal biological dose (OBD) of immunotherapy. We derive the closed-form estimates of the joint toxicity-efficacy response probabilities under the monotonic increasing constraint for the toxicity outcomes. The first stage of the design aims to explore the toxicity profile. The second stage aims to find the OBD through a utility function. The simulation results show that the TSNP design yields superior operating characteristics than the existing Bayesian parametric designs. User-friendly computational software is freely available to facilitate the application of the proposed design to real trials. The second topic focuses on dose optimization in drug-combination trials. We propose the Great Wall design, which employs a "divide-and-conquer" algorithm to address the issue of partial order of toxicity. It constructs a candidate set of the most promising dose combinations using the mean utility method. The patients assigned to the candidate set are followed to collect the survival outcomes and the final optimal dose combination is then select to maximize the survival benefit. A simulation study confirmed the desirable operating characteristics of the Great Wall design, compared with other conventional phase I/II designs for drug-combination trials. The last topic of my dissertation is prospective assessment of risk biomarkers of sinusoidal obstruction syndrome (SOS) after hematopoietic cell transplantation (HCT). We aimed to define risk groups for SOS occurrence using three proteins: L-Ficolin, Hyaluronic Acid (HA), and Stimulation-2 (ST2), by assessing SOS incidence at day 35 post-HCT, and overall survival (OS) at day 100 post-HCT. We conclude that L-Ficolin, HA, and ST2 levels measured as early as three days post-HCT improved risk stratification for SOS occurrence and OS.
  • Item
    Bayesian Adaptive Designs for Early Phase Clinical Trials
    (2023-07) Guo, Jiaying; Zang, Yong; Han, Jiali; Zhao, Yi; Ren, Jie
    Delayed toxicity outcomes are common in phase I clinical trials, especially in oncology studies. It causes logistic difficulty, wastes resources, and prolongs the trial duration. We propose the time-to-event 3+3 (T-3+3) design to solve the delayed outcome issue for the 3+3 design. We convert the dose decision rules of the 3+3 design into a series of events. A transparent yet efficient Bayesian probability model is applied to calculate the event happening probabilities in the presence of delayed outcomes, which incorporates the informative pending patients' remaining follow-up time into consideration. The T-3+3 design only models the information for the pending patients and seamlessly reduces to the conventional 3+3 design in the absence of delayed outcomes. We further extend the proposed method to interval 3+3 (i3+3) design, an algorithm-based phase I dose-finding design which is based on simple but more comprehensive rules that account for the variabilities in the observed data. Similarly, the dose escalation/deescalation decision is recommended by comparing the event happening probabilities which are calculated by considering the ratio between the averaged follow-up time for at-risk patients and the total assessment window. We evaluate the operating characteristics of the proposed designs through simulation studies and compare them to existing methods. The umbrella trial is a clinical trial strategy that accommodates the paradigm shift towards personalized medicine, which evaluates multiple investigational drugs in different subgroups of patients with the same disease. A Bayesian adaptive umbrella trial design is proposed to select effective targeted agents for different biomarker-based subgroups of patients. To facilitate treatment evaluation, the design uses a mixture regression model that jointly models short-term and long-term response outcomes. In addition, a data-driven latent class model is employed to adaptively combine subgroups into induced latent classes based on overall data heterogeneities, which improves the statistical power of the umbrella trial. To enhance individual ethics, the design includes a response-adaptive randomization scheme with early stopping rules for futility and superiority. Bayesian posterior probabilities are used to make these decisions. Simulation studies demonstrate that the proposed design outperforms two conventional designs across a range of practical treatment-outcome scenarios.