- Browse by Title
Biostatistics Department Theses and Dissertations
Permanent URI for this collection
Browse
Browsing Biostatistics Department Theses and Dissertations by Title
Now showing 1 - 10 of 63
Results Per Page
Sort Options
Item Advanced Modeling of Longitudinal Spectroscopy Data(2014) Kundu, Madan Gopal; Harezlak, Jaroslaw; Randolph, Timothy W.; Sarkar, Jyotirmoy; Steele, Gregory K.; Yiannoutsos, Constantin T.Magnetic resonance (MR) spectroscopy is a neuroimaging technique. It is widely used to quantify the concentration of important metabolites in a brain tissue. Imbalance in concentration of brain metabolites has been found to be associated with development of neurological impairment. There has been increasing trend of using MR spectroscopy as a diagnosis tool for neurological disorders. We established statistical methodology to analyze data obtained from the MR spectroscopy in the context of the HIV associated neurological disorder. First, we have developed novel methodology to study the association of marker of neurological disorder with MR spectrum from brain and how this association evolves with time. The entire problem fits into the framework of scalar-on-function regression model with individual spectrum being the functional predictor. We have extended one of the existing cross-sectional scalar-on-function regression techniques to longitudinal set-up. Advantage of proposed method includes: 1) ability to model flexible time-varying association between response and functional predictor and (2) ability to incorporate prior information. Second part of research attempts to study the influence of the clinical and demographic factors on the progression of brain metabolites over time. In order to understand the influence of these factors in fully non-parametric way, we proposed LongCART algorithm to construct regression tree with longitudinal data. Such a regression tree helps to identify smaller subpopulations (characterized by baseline factors) with differential longitudinal profile and hence helps us to identify influence of baseline factors. Advantage of LongCART algorithm includes: (1) it maintains of type-I error in determining best split, (2) substantially reduces computation time and (2) applicable even observations are taken at subject-specific time-points. Finally, we carried out an in-depth analysis of longitudinal changes in the brain metabolite concentrations in three brain regions, namely, white matter, gray matter and basal ganglia in chronically infected HIV patients enrolled in HIV Neuroimaging Consortium study. We studied the influence of important baseline factors (clinical and demographic) on these longitudinal profiles of brain metabolites using LongCART algorithm in order to identify subgroup of patients at higher risk of neurological impairment.Item An Analysis of Survival Data when Hazards are not Proportional: Application to a Cancer Treatment Study(2021-12) White, John Benjamin; Yiannoutsos, Constantin; Bakoyannis, Giorgos; Fadel, WilliamThe crossing of Kaplan-Meier survival curves presents a challenge when conducting survival analysis studies, making it unclear whether any of the study groups involved present any significant difference in survival. An approach involving the determination of maximum vertical distance between the curves is considered here as a method to assess whether a survival advantage exists between different groups of patients. The method is illustrated on a dataset containing survival times of patients treated with two cancer treatment regimes, one involving treatment by chemotherapy alone, and the other by treatment with both chemotherapy and radiotherapy.Item Applications of Time to Event Analysis in Clinical Data(2021-12) Xu, Chenjia; Gao, Sujuan; Liu, Hao; Zang, Yong; Zhang, Jianjun; Zhao, YiSurvival analysis has broad applications in diverse research areas. In this dissertation, we consider an innovative application of survival analysis approach to phase I dose-finding design and the modeling of multivariate survival data. In the first part of the dissertation, we apply time to event analysis in an innovative dose-finding design. To account for the unique feature of a new class of oncology drugs, T-cell engagers, we propose a phase I dose-finding method incorporating systematic intra-subject dose escalation. We utilize survival analysis approach to analyze intra-subject dose-escalation data and to identify the maximum tolerated dose. We evaluate the operating characteristics of the proposed design through simulation studies and compare it to existing methodologies. The second part of the dissertation focuses on multivariate survival data with semi-competing risks. Time-to-event data from the same subject are often correlated. In addition, semi-competing risks are sometimes present with correlated events when a terminal event can censor other non-terminal events but not vice versa. We use a semiparametric frailty model to account for the dependence between correlated survival events and semi-competing risks and adopt penalized partial likelihood (PPL) approach for parameter estimation. In addition, we investigate methods for variable selection in semi-parametric frailty models and propose a double penalized partial likelihood (DPPL) procedure for variable selection of fixed effects in frailty models. We consider two penalty functions, least absolute shrinkage and selection operator (LASSO) and smoothly clipped absolute deviation (SCAD) penalty. The proposed methods are evaluated in simulation studies and illustrated using data from Indianapolis-Ibadan Dementia Project.Item Association Between Tobacco Related Diagnoses and Alzheimer Disease: A Population Study(2022-05) Almalki, Amwaj Ghazi; Zhang, Pengyue; Johnson, Travis; Fadel, WilliamBackground: Tobacco use is associated with an increased risk of developing Alzheimer's disease (AD). 14% of the incidence of AD is associated with various types of tobacco exposure. Additional real-world evidence is warranted to reveal the association between tobacco use and AD in age/gender-specific subpopulations. Method: In this thesis, the relationships between diagnoses related to tobacco use and diagnoses of AD in gender- and age-specific subgroups were investigated, using health information exchange data. The non-parametric Kaplan-Meier method was used to estimate the incidence of AD. Furthermore, the log-rank test was used to compare incidence between individuals with and without tobacco related diagnoses. In addition, we used semi-parametric Cox models to examine the association between tobacco related diagnoses and diagnoses of AD, while adjusting covariates. Results: Tobacco related diagnosis was associated with increased risk of developing AD comparing to no tobacco related diagnosis among individuals aged 60-74 years (female hazard ratio [HR] =1.26, 95% confidence interval [CI]: 1.07 – 1.48, p-value = 0.005; and male HR =1.33, 95% CI: 1.10 - 1.62, p-value =0.004). Tobacco related diagnosis was associated with decreased risk of developing AD comparing to no tobacco related diagnosis among individuals aged 75-100 years (female HR =0.79, 95% CI: 0.70 - 0.89, p-value =0.001; and male HR =0.90, 95% CI: 0.82 - 0.99, p-value =0.023). Conclusion: Individuals with tobacco related diagnoses were associated with an increased risk of developing AD in older adults aged 60-75 years. Among older adults aged 75-100 years, individuals with tobacco related diagnoses were associated with a decreased risk of developing AD.Item Bayesian Adaptive Designs for Early Phase Clinical Trials(2023-07) Guo, Jiaying; Zang, Yong; Han, Jiali; Zhao, Yi; Ren, JieDelayed toxicity outcomes are common in phase I clinical trials, especially in oncology studies. It causes logistic difficulty, wastes resources, and prolongs the trial duration. We propose the time-to-event 3+3 (T-3+3) design to solve the delayed outcome issue for the 3+3 design. We convert the dose decision rules of the 3+3 design into a series of events. A transparent yet efficient Bayesian probability model is applied to calculate the event happening probabilities in the presence of delayed outcomes, which incorporates the informative pending patients' remaining follow-up time into consideration. The T-3+3 design only models the information for the pending patients and seamlessly reduces to the conventional 3+3 design in the absence of delayed outcomes. We further extend the proposed method to interval 3+3 (i3+3) design, an algorithm-based phase I dose-finding design which is based on simple but more comprehensive rules that account for the variabilities in the observed data. Similarly, the dose escalation/deescalation decision is recommended by comparing the event happening probabilities which are calculated by considering the ratio between the averaged follow-up time for at-risk patients and the total assessment window. We evaluate the operating characteristics of the proposed designs through simulation studies and compare them to existing methods. The umbrella trial is a clinical trial strategy that accommodates the paradigm shift towards personalized medicine, which evaluates multiple investigational drugs in different subgroups of patients with the same disease. A Bayesian adaptive umbrella trial design is proposed to select effective targeted agents for different biomarker-based subgroups of patients. To facilitate treatment evaluation, the design uses a mixture regression model that jointly models short-term and long-term response outcomes. In addition, a data-driven latent class model is employed to adaptively combine subgroups into induced latent classes based on overall data heterogeneities, which improves the statistical power of the umbrella trial. To enhance individual ethics, the design includes a response-adaptive randomization scheme with early stopping rules for futility and superiority. Bayesian posterior probabilities are used to make these decisions. Simulation studies demonstrate that the proposed design outperforms two conventional designs across a range of practical treatment-outcome scenarios.Item Bayesian Adaptive Designs for Phase II Clinical Trials Evaluating Subgroup-Specific Treatment Effect(2024-12) Shan, Mu; Zang, Yong; Han, Jiali; Tu, Wanzhu; Zhang, PengyueIn Phase II clinical trials, particularly for molecularly targeted agents (MTAs) and biotherapies, there is a critical need to evaluate subgroup-specific treatment effects due to the heterogeneous nature of these therapies. This dissertation introduces two innovative Bayesian adaptive designs for biomarker-guided clinical trials: the Bayesian Order Constrained Adaptive (BOCA) design and the Bayesian Adaptive Marker-Stratified Design Using Calibrated Spike-and-Slab priors (SSS). The BOCA design addresses the limitations of the "one-size-fits-all" approach in non-randomized Phase II trials by efficiently detecting subgroup-specific treatment effects. It combines elements of enrichment and sequential designs, starting with an "all-comers" stage and transitioning to an enrichment stage based on interim analysis results. The decision to continue with either the marker-positive or marker-negative subgroup is guided by two posterior probabilities utilizing inherent ordering constraints. This adaptive approach enhances trial efficiency and cost-effectiveness while managing missing biomarker data. Comprehensive simulation studies show that the BOCA design outperforms conventional designs in detecting subgroup-specific treatment effects, making it a robust tool for Phase II trials. The SSS design improves the efficiency of marker-stratified designs (MSD) by leveraging clinical features of biomarkers and treatments. Patients are classified into marker-positive and marker-negative subgroups and randomized to receive either the MTA or a control treatment. The SSS design uses spike-and-slab priors to dynamically share information on response rates across subgroups, governed by two posterior probabilities that assess similarities in response rates. Additionally, it incorporates a Bayesian multiple imputation method to address missing biomarker profiles. Simulation studies confirm that the SSS design exhibits favorable operational characteristics, surpassing conventional designs in evaluating subgroup-specific treatment effects. Both the BOCA and SSS designs represent significant advancements in Bayesian adaptive methodologies for Phase II trials. By addressing traditional approach limitations, these designs enhance the evaluation of subgroup-specific treatment effects, contributing valuable methodologies to the field of personalized medicine.Item Bayesian Adaptive Dose-Finding Clinical Trial Designs with Late-Onset Outcomes(2021-07) Zhang, Yifei; Zhang, Yong; Song, Yiqing; Liu, Hao; Bakoyannis, GiorgosThe late-onset outcome issue is common in early phase dose- nding clinical trials. This problem becomes more intractable in phase I/II clinical trials because both toxicity and e cacy responses are subject to the late-onset outcome issue. The existing methods applying for the phase I trials cannot be used directly for the phase I/II trial due to a lack of capability to model the joint toxicity{e cacy distribution. We propose a conditional weighted likelihood (CWL) method to circumvent this issue. The key idea of the CWL method is to decompose the joint probability into the product of marginal and conditional probabilities and then weight each probability based on each patient's actual follow-up time. We further extend the proposed method to handle more complex situations where the late-onset outcomes are competing risks or semicompeting risks outcomes. We treat the late-onset competing risks/semi-competing risks outcomes as missing data and develop a series of Bayesian data-augmentation methods to e ciently impute the missing data and draw the posterior samples of the parameters of interest. We also propose adaptive dose- nding algorithms to allocate patients and identify the optimal biological dose during the trial. Simulation studies show that the proposed methods yield desirable operating characteristics and outperform the existing methods.Item Bayesian design and analysis of cluster randomized trials(2017-08-07) Xiao, Shan; Tu, Wanzhu; Liu, ZiyueCluster randomization is frequently used in clinical trials for convenience of inter ventional implementation and for reducing the risk of contamination. The opera tional convenience of cluster randomized trials, however, is gained at the expense of reduced analytical power. Compared to individually randomized studies, cluster randomized trials often have a much-reduced power. In this dissertation, I consider ways of enhancing analytical power with historical trial data. Specifically, I introduce a hierarchical Bayesian model that is designed to incorporate available information from previous trials of the same or similar interventions. Operationally, the amount of information gained from the previous trials is determined by a Kullback-Leibler divergence measure that quantifies the similarity, or lack thereof, between the histor ical and current trial data. More weight is given to the historical data if they more closely resemble the current trial data. Along this line, I examine the Type I error rates and analytical power associated with the proposed method, in comparison with the existing methods without utilizing the ancillary historical information. Similarly, to design a cluster randomized trial, one could estimate the power by simulating trial data and comparing them with the historical data from the published studies. Data analytical and power simulation methods are developed for more general situations of cluster randomized trials, with multiple arms and multiple types of data following the exponential family of distributions. An R package is developed for practical use of the methods in data analysis and trial design.Item Building Prediction Models for Dementia: The Need to Account for Interval Censoring and the Competing Risk of Death(2019-08) Marchetti, Arika L.; Bakoyannis, Giorgos; Li, Xiaochun; Gao, Sujuan; Yiannoutsos, ConstantinContext. Prediction models for dementia are crucial for informing clinical decision making in older adults. Previous models have used genotype and age to obtain risk scores to determine risk of Alzheimer’s Disease, one of the most common forms of dementia (Desikan et al., 2017). However, previous prediction models do not account for the fact that the time to dementia onset is unknown, lying between the last negative and the first positive dementia diagnosis time (interval censoring). Instead, these models use time to diagnosis, which is greater than or equal to the true dementia onset time. Furthermore, these models do not account for the competing risk of death which is quite frequent among elder adults. Objectives. To develop a prediction model for dementia that accounts for interval censoring and the competing risk of death. To compare the predictions from this model with the predictions from a naïve analysis that ignores interval censoring and the competing risk of death. Methods. We apply the semiparametric sieve maximum likelihood (SML) approach to simultaneously model the cumulative incidence function (CIF) of dementia and death while accounting for interval censoring (Bakoyannis, Yu, & Yiannoutsos, 2017). The SML is implemented using the R package intccr. The CIF curves of dementia are compared for the SML and the naïve approach using a dataset from the Indianapolis Ibadan Dementia Project. Results. The CIF from the SML and the naïve approach illustrated that for healthier individuals at baseline, the naïve approach underestimated the incidence of dementia compared to the SML, as a result of interval censoring. Individuals with a poorer health condition at baseline have a CIF that appears to be overestimated in the naïve approach. This is due to older individuals with poor health conditions having an elevated risk of death. Conclusions. The SML method that accounts for the competing risk of death along with interval censoring should be used for fitting prediction/prognostic models of dementia to inform clinical decision making in older adults. Without controlling for the competing risk of death and interval censoring, the current models can provide invalid predictions of the CIF of dementia.Item classCleaner: A Quantitative Method for Validating Peptide Identification in LC-MS/MS Workflows(2020-05) Key, Melissa Chester; Boukai, Benzion; Ragg, Susanne; Katz, Barry; Mosley, AmberBecause label-free liquid chromatography-tandem mass spectrometry (LC-MS/MS) shotgun proteomics infers the peptide sequence of each measurement, there is inherent uncertainty in the identity of each peptide and its originating protein. Removing misidentified peptides can improve the accuracy and power of downstream analyses when differences between proteins are of primary interest. In this dissertation I present classCleaner, a novel algorithm designed to identify misidentified peptides from each protein using the available quantitative data. The algorithm is based on the idea that distances between peptides belonging to the same protein are stochastically smaller than those between peptides in different proteins. The method first determines a threshold based on the estimated distribution of these two groups of distances. This is used to create a decision rule for each peptide based on counting the number of within-protein distances smaller than the threshold. Using simulated data, I show that classCleaner always reduces the proportion of misidentified peptides, with better results for larger proteins (by number of constituent peptides), smaller inherent misidentification rates, and larger sample sizes. ClassCleaner is also applied to a LC-MS/MS proteomics data set and the Congressional Voting Records data set from the UCI machine learning repository. The later is used to demonstrate that the algorithm is not specific to proteomics.