Biostatistics Department Theses and Dissertations

Permanent URI for this collection


Recent Submissions

Now showing 1 - 10 of 60
  • Item
    Identify Signature Genes/Pathways to Characterize Alzheimer's Disease Subtypes Based on Uncoupled Tauopathies and Cognitive Decline
    (2024-06) Huang, Xiaoqing; Huang, Kun; Zhang, Jie; Johnson, Travis; Zhang, Jianjun
    Alzheimer's disease (AD) is a slow-progressing dementia usually found in elderlies, with heterogeneous clinical phenotypes and possible underlying mechanisms. Widely spread tauopathy is one of the pathological change hallmarks in AD brains, in which microtube protein tau forms scar-like neurofibrillary tangles that kill neurons. However, subgroups of patients present unmatched tauopathy progression with their cognitive decline. A detailed study on these so-called atypical AD patients allows for a deeper understanding of possible various disease mechanisms and the factors contributing to disease vulnerability or resilience, which can help guide the drug development and treatment strategy tailored to different subgroups, as well as establish foundations for disease prevention. By identifying specific molecular biomarkers associated with each subtype, I hope to help clinicians diagnose various AD subtypes at an earlier stage. In this work, I have performed transcriptomic and proteomic characterization of two atypical AD subtypes on two large AD/normal brain cohorts to further understand the role of tauopathy in the AD etiology, identified several pathways that are associated with the two phenotypes’ AD-resilient and AD-vulnerable characteristics, and tried to identify the potential drug targets for the precision treatment of AD using extensive bioinformatic approaches. In the meanwhile, two methodologies were developed and applied. One is a new type of interpretable deep learning model (ParsVNN) coupled with the neural network architecture with the hierarchical structure of the gene/protein pathways is introduced and leveraged to address the complexity and improve the interpretability by making its biological hierarchy simple and specific to the predicted subgroup. The other is a label transferring approach using optimal transport from brain samples to blood samples in the hope of finding serum biomarkers for atypical AD groups in live patients and predicting their disease progression in a non-invasive fashion. Conclusively, the study improves our understanding of AD etiology and leads to more personalized care and disease prevention. It acknowledges the complexity of the disease and aims to uncover mechanistic distinctions within the broad Alzheimer’s disease spectrum.
  • Item
    Modified 3+3 Design for MTD Re-estimation
    (2024-06) Zhang, Tianshu; Zang, Yong; Han, Yan; Liu, Ziyue
    The 3+3 clinical trial design is one of the most popular dose-finding designs used in phase I oncology trials to identify the maximum tolerated dose (MTD) for new treatment regimens. While this design is widely used due to its simplicity , it has some notable limitations, including a maximum of six patients per dose level and fixed target toxicity rates. To address these issues, we propose a modified 3+3 design that extends the traditional 3+3 design by treating the remaining patients at the MTD level for additional dose-limiting toxicity (DLT) assessment. This modification allows for a more flexible and accurate way to identify the MTD, enhanced by the use of isotonic regression to calculate DLT rates. To compare the modified 3+3 designs and the traditional 3+3 design, computer simulation studies have been carried out under various dose-toxicity scenarios. The results show that the modified 3+3 design yields higher accuracy in MTD identification.
  • Item
    Transparent and Efficient Designs for Clinical Trials
    (2024-05) Qiu, Yingjie; Zhao, Yi; Zang, Yong; Perkins, Susan; Zhang, Pengyue; Yan, Jingwen
    Modern early phase clinical trials are integral in assessing the efficacy and safety of new treatments. Traditional methodologies heavily rely on complex parametric models to determine dose-response relationships. They come with inherent challenges: difficulty in practical validation, potential for poor performances if parametric assumptions are inaccurately defined, and a heavy learning burden for medical practitioners. The need for novel methods that bridge the gap between statistical robustness and clinical applicability is evident. To accommodate those issues, we proposed two transparent and efficient designs. The modified isotonic regression based phase I/II clinical trial design (mISO) and the utility-based model free phase I/II design (UFO) represent innovative strides in identifying optimal doses for clinical trials. The mISO design, eschewing traditional parametric assumptions, offers a transparent and efficient method, adaptable to various dose-response curves and enhanced by the mISO-B extension for delayed outcomes. In parallel, the UFO design, specifically tailored for immunotherapy trials, diverges from complex models to employ a dynamic, utility-based approach. This approach continuously updates with trial data, optimizing dose allocation for each patient cohort. Both designs have demonstrated superior performance in comprehensive simulation studies by comparing them with existing methods. Several sequential methods populate the statistical literature, but there remains a notable gap in addressing secondary objectives without altering the primary aim. Addressing this, a two-stage design for randomized controlled trials sequentially testing superiority and noninferiority introduces a novel two-stage group sequential strategy. This strategy primarily aims to establish the superiority of a treatment, assessed at both interim and final stages. Uniquely, it shifts to test noninferiority only if the superiority criterion is not met at the end of the second stage. This dual-focus approach is particularly appreciated in clinical settings for its practical application. Furthermore, it provides a valuable alternative in scenarios where achieving sufficient power for the superiority objective is hindered by limited participant recruitment, allowing the study to pivot towards demonstrating noninferiority.
  • Item
    Statistical Methods for Cancer Research
    (2024-01) Han, Yan; Zhao, Yi; Tu, Wanzhu; Li, Yang; Zhang, Jianjun
    Phase I/II clinical trial design is pivotal for achieving optimal therapeutic effect in immunotherapy and drug combination therapy for cancer treatment. Additionally, the identification of biomarkers associated with the risk of severe complications during cancer therapy is a crucial research area. This dissertation contains three related topics, which focus on adaptive Phase I/II clinical trial design and the identification of biomarkers relevant to cancer research. The first topic focuses on developing a two-stage nonparametric (TSNP) phase I/II clinical trial design to identify the optimal biological dose (OBD) of immunotherapy. We derive the closed-form estimates of the joint toxicity-efficacy response probabilities under the monotonic increasing constraint for the toxicity outcomes. The first stage of the design aims to explore the toxicity profile. The second stage aims to find the OBD through a utility function. The simulation results show that the TSNP design yields superior operating characteristics than the existing Bayesian parametric designs. User-friendly computational software is freely available to facilitate the application of the proposed design to real trials. The second topic focuses on dose optimization in drug-combination trials. We propose the Great Wall design, which employs a "divide-and-conquer" algorithm to address the issue of partial order of toxicity. It constructs a candidate set of the most promising dose combinations using the mean utility method. The patients assigned to the candidate set are followed to collect the survival outcomes and the final optimal dose combination is then select to maximize the survival benefit. A simulation study confirmed the desirable operating characteristics of the Great Wall design, compared with other conventional phase I/II designs for drug-combination trials. The last topic of my dissertation is prospective assessment of risk biomarkers of sinusoidal obstruction syndrome (SOS) after hematopoietic cell transplantation (HCT). We aimed to define risk groups for SOS occurrence using three proteins: L-Ficolin, Hyaluronic Acid (HA), and Stimulation-2 (ST2), by assessing SOS incidence at day 35 post-HCT, and overall survival (OS) at day 100 post-HCT. We conclude that L-Ficolin, HA, and ST2 levels measured as early as three days post-HCT improved risk stratification for SOS occurrence and OS.
  • Item
    Bayesian Adaptive Designs for Early Phase Clinical Trials
    (2023-07) Guo, Jiaying; Zang, Yong; Han, Jiali; Zhao, Yi; Ren, Jie
    Delayed toxicity outcomes are common in phase I clinical trials, especially in oncology studies. It causes logistic difficulty, wastes resources, and prolongs the trial duration. We propose the time-to-event 3+3 (T-3+3) design to solve the delayed outcome issue for the 3+3 design. We convert the dose decision rules of the 3+3 design into a series of events. A transparent yet efficient Bayesian probability model is applied to calculate the event happening probabilities in the presence of delayed outcomes, which incorporates the informative pending patients' remaining follow-up time into consideration. The T-3+3 design only models the information for the pending patients and seamlessly reduces to the conventional 3+3 design in the absence of delayed outcomes. We further extend the proposed method to interval 3+3 (i3+3) design, an algorithm-based phase I dose-finding design which is based on simple but more comprehensive rules that account for the variabilities in the observed data. Similarly, the dose escalation/deescalation decision is recommended by comparing the event happening probabilities which are calculated by considering the ratio between the averaged follow-up time for at-risk patients and the total assessment window. We evaluate the operating characteristics of the proposed designs through simulation studies and compare them to existing methods. The umbrella trial is a clinical trial strategy that accommodates the paradigm shift towards personalized medicine, which evaluates multiple investigational drugs in different subgroups of patients with the same disease. A Bayesian adaptive umbrella trial design is proposed to select effective targeted agents for different biomarker-based subgroups of patients. To facilitate treatment evaluation, the design uses a mixture regression model that jointly models short-term and long-term response outcomes. In addition, a data-driven latent class model is employed to adaptively combine subgroups into induced latent classes based on overall data heterogeneities, which improves the statistical power of the umbrella trial. To enhance individual ethics, the design includes a response-adaptive randomization scheme with early stopping rules for futility and superiority. Bayesian posterior probabilities are used to make these decisions. Simulation studies demonstrate that the proposed design outperforms two conventional designs across a range of practical treatment-outcome scenarios.
  • Item
    Sparse Latent-Space Learning for High-Dimensional Data: Extensions and Applications
    (2023-05) White, Alexander James; Cao, Sha; Tu, Wanzhu; Zhang, Chi; Zhao, Yi
    The successful treatment and potential eradication of many complex diseases, such as cancer, begins with elucidating the convoluted mapping of molecular profiles to phenotypical manifestation. Our observed molecular profiles (e.g., genomics, transcriptomics, epigenomics) are often high-dimensional and are collected from patient samples falling into heterogeneous disease subtypes. Interpretable learning from such data calls for sparsity-driven models. This dissertation addresses the high dimensionality, sparsity, and heterogeneity issues when analyzing multiple-omics data, where each method is implemented with a concomitant R package. First, we examine challenges in submatrix identification, which aims to find subgroups of samples that behave similarly across a subset of features. We resolve issues such as two-way sparsity, non-orthogonality, and parameter tuning with an adaptive thresholding procedure on the singular vectors computed via orthogonal iteration. We validate the method with simulation analysis and apply it to an Alzheimer’s disease dataset. The second project focuses on modeling relationships between large, matched datasets. Exploring regressional structures between large data sets can provide insights such as the effect of long-range epigenetic influences on gene expression. We present a high-dimensional version of mixture multivariate regression to detect patient clusters, each with different correlation structures of matched-omics datasets. Results are validated via simulation and applied to matched-omics data sets. In the third project, we introduce a novel approach to modeling spatial transcriptomics (ST) data with a spatially penalized multinomial model of the expression counts. This method solves the low-rank structures of zero-inflated ST data with spatial smoothness constraints. We validate the model using manual cell structure annotations of human brain samples. We then applied this technique to additional ST datasets.
  • Item
    Insights in Response to Statewide COVID-19 Sampling in Indiana
    (2023-05) Shields, David William, Jr.; Yiannoutsos, Constantin; Fadel, William; Bakoyannis, Giorgos
    During 2020, the Indiana State Department of Health conducted a longitudinal study of novel severe acute respiratory syndrome coronavirus 2 (SARS-COV-2) virus, the cause of COVID-19 disease, to understand the number of past and current infections as well as the prevalence of disease in the State of Indiana by conducting a survey to participants as well as administering testing for exposure to SARS-COV-2. The study consisted of 3 waves of testing, each spread months apart, consisting of a random sample and a non-random sample. The non-random sample was used to ensure the sample population was representative of the state of Indiana and was used as stratum in the logistic regression model, allowing for the adjustment for nonresponse. These finding indicate that persons of non-White race and persons of Hispanic ethnicity had highest risk of exposure to the virus. Understanding the disparity in health in various racial and ethnic populations and addressing how different communities are impacted by the pandemic, as well as working with the community is paramount when attempting to mitigate a pandemic. In addition, understanding the data from the ambient pandemic when instituting measures to mitigate the spread of viruses is also extremely important for managing health emergencies such as the COVID-19 pandemic.
  • Item
    Single-cell Approach to Repurposing of Drugs for Alzheimer’s Disease
    (2023-05) Peyton, Madeline Elizabeth; Johnson, Travis S.; Zhang, Jie; Zhang, Pengyue
    Background: Alzheimer’s disease (AD) is the third leading cause of death for the older demographic in the United States, just after heart disease and cancer. However, unlike heart disease and cancer, the death rates for AD are increasing. Despite extensive research, the cause or origin of AD remains unclear and there is no existing cure. However, with the improvement of single-cell RNA-sequencing (scRNA-seq) technologies and drug repurposing tools, we can further our knowledge of AD and its pathogenesis. Method: Our primary aim was to identify repurposable drug and compound candidates for AD treatment and identify significant cell types and signaling pathways using two scRNA-seq datasets from cortex samples of AD patients and controls. To achieve this aim, we generated differential gene expression profiles, calculated log fold-changes, and estimated standard errors to make pairwise comparisons between the diseased and healthy samples. We used the 21,304 drugs/compounds with response gene expression profiles in 98 cell lines from the LINCS L1000 project to detect consistent differentially expressed genes (DEGs), that were either i) up-regulated in cells of diseased samples and down-regulated in cells with treatment, or ii) down-regulated in cells from diseased samples but up-regulated in cells with treatment. To evaluate these identified drugs, we compared the p-value, false discovery rate (FDR) and A Single-cell Guided Pipeline to Aid Repurposing of Drugs (ASGARD) drug score for each cell type. We further annotated and assessed doublet cell types within the Grubman et al. dataset using cell type proportions. Result: The analysis provided several potential therapeutic treatments for AD and its target genes and pathways as well as important cell type interactions. Notably, we identified an interaction between endothelial cells and microglia, and further identified drug candidates to target this interaction. Conclusion: We identified repurposable drugs/compounds candidates in each dataset which were also identified in literature. We further identified doublet cell type interactions of interest and drugs that target this interaction.
  • Item
    Marginal Regression Analysis of Clustered and Incomplete Event History Data
    (2022-12) Zhou, Wenxian; Bakoyannis, Giorgos; Zhang, Ying; Yiannoutsos, Constantin T.; Zang, Yong; Hasan, Mohammad Al
    Event history data, including competing risks and more general multistate process data, are commonly encountered in biomedical studies. In practice, such event history data are often subject to intra-cluster correlation in multicenter studies and are complicated due to informative cluster size, a situation where the outcomes under study are associated with the size of the cluster. In addition, outcomes or covariates are frequently incompletely observed in real-world settings. Ignoring these statistical issues will lead to invalid inferences. In this dissertation, I develop a series of marginal regression methods to address these statistical issues with competing risks and more general multistate process data. The motivation for this research comes from a large multicenter HIV study and a multicenter randomized oncology trial. First, I propose a marginal regression method for clustered competing risks data with missing cause of failure. I consider the semiparametric proportional cause-specific hazards model and propose a maximum partial pseudolikelihood estimator under a plausible missing at random assumption. Second, I consider more general clustered multistate process data and propose a marginal regression framework for the transient state occupation probabilities. The proposed method is based on a weighted functional generalized estimating equation approach. A nonparametric hypothesis test for the covariate effect is also provided. Third, I extend the proposed framework in the second part of the dissertation to account for missing covariates, via a weighted functional pseudo-expected estimating equation approach. I conduct extensive simulation studies to evaluate the finite sample performance of the proposed methods. The proposed methods are applied to the motivating multicenter HIV study and oncology trial datasets.
  • Item
    Group Specific Dynamic Models of Time Varying Exposures on a Time-to-Event Outcome
    (2022-12) Tong, Yan; Gao, Sujuan; Bakoyannis, Giorgos; Tu, Wanzhu; Han, Jiali
    Time-to-event outcomes are widely utilized in medical research. Assessing the cumulative effects of time-varying exposures on time-to-event outcomes poses challenges in statistical modeling. First, exposure status, intensity, or duration may vary over time. Second, exposure effects may be delayed over a latent period, a situation that is not considered in traditional survival models. Third, exposures that occur within a time window may cumulatively in uence an outcome. Fourth, such cumulative exposure effects may be non-linear over exposure latent period. Lastly, exposure-outcome dynamics may differ among groups defined by individuals' characteristics. These challenges have not been adequately addressed in current statistical models. The objective of this dissertation is to provide a novel approach to modeling group-specific dynamics between cumulative timevarying exposures and a time-to-event outcome. A framework of group-specific dynamic models is introduced utilizing functional time-dependent cumulative exposures within an etiologically relevant time window. Penalizedspline time-dependent Cox models are proposed to evaluate group-specific outcome-exposure dynamics through the associations of a time-to-event outcome with functional cumulative exposures and group-by-exposure interactions. Model parameter estimation is achieved by penalized partial likelihood. Hypothesis testing for comparison of group-specific exposure effects is performed by Wald type tests. These models are extended to group-specific non-linear exposure intensity-latency-outcome relationship and group-specific interaction effect from multiple exposures. Extensive simulation studies are conducted and demonstrate satisfactory model performances. The proposed methods are applied to the analyses of group-specific associations between antidepressant use and time to coronary artery disease in a depression-screening cohort using data extracted from electronic medical records.