- Browse by Title
Biostatistics Department Theses and Dissertations
Permanent URI for this collection
Browse
Browsing Biostatistics Department Theses and Dissertations by Title
Now showing 1 - 10 of 58
Results Per Page
Sort Options
Item Advanced Modeling of Longitudinal Spectroscopy Data(2014) Kundu, Madan Gopal; Harezlak, Jaroslaw; Randolph, Timothy W.; Sarkar, Jyotirmoy; Steele, Gregory K.; Yiannoutsos, Constantin T.Magnetic resonance (MR) spectroscopy is a neuroimaging technique. It is widely used to quantify the concentration of important metabolites in a brain tissue. Imbalance in concentration of brain metabolites has been found to be associated with development of neurological impairment. There has been increasing trend of using MR spectroscopy as a diagnosis tool for neurological disorders. We established statistical methodology to analyze data obtained from the MR spectroscopy in the context of the HIV associated neurological disorder. First, we have developed novel methodology to study the association of marker of neurological disorder with MR spectrum from brain and how this association evolves with time. The entire problem fits into the framework of scalar-on-function regression model with individual spectrum being the functional predictor. We have extended one of the existing cross-sectional scalar-on-function regression techniques to longitudinal set-up. Advantage of proposed method includes: 1) ability to model flexible time-varying association between response and functional predictor and (2) ability to incorporate prior information. Second part of research attempts to study the influence of the clinical and demographic factors on the progression of brain metabolites over time. In order to understand the influence of these factors in fully non-parametric way, we proposed LongCART algorithm to construct regression tree with longitudinal data. Such a regression tree helps to identify smaller subpopulations (characterized by baseline factors) with differential longitudinal profile and hence helps us to identify influence of baseline factors. Advantage of LongCART algorithm includes: (1) it maintains of type-I error in determining best split, (2) substantially reduces computation time and (2) applicable even observations are taken at subject-specific time-points. Finally, we carried out an in-depth analysis of longitudinal changes in the brain metabolite concentrations in three brain regions, namely, white matter, gray matter and basal ganglia in chronically infected HIV patients enrolled in HIV Neuroimaging Consortium study. We studied the influence of important baseline factors (clinical and demographic) on these longitudinal profiles of brain metabolites using LongCART algorithm in order to identify subgroup of patients at higher risk of neurological impairment.Item An Analysis of Survival Data when Hazards are not Proportional: Application to a Cancer Treatment Study(2021-12) White, John Benjamin; Yiannoutsos, Constantin; Bakoyannis, Giorgos; Fadel, WilliamThe crossing of Kaplan-Meier survival curves presents a challenge when conducting survival analysis studies, making it unclear whether any of the study groups involved present any significant difference in survival. An approach involving the determination of maximum vertical distance between the curves is considered here as a method to assess whether a survival advantage exists between different groups of patients. The method is illustrated on a dataset containing survival times of patients treated with two cancer treatment regimes, one involving treatment by chemotherapy alone, and the other by treatment with both chemotherapy and radiotherapy.Item Applications of Time to Event Analysis in Clinical Data(2021-12) Xu, Chenjia; Gao, Sujuan; Liu, Hao; Zang, Yong; Zhang, Jianjun; Zhao, YiSurvival analysis has broad applications in diverse research areas. In this dissertation, we consider an innovative application of survival analysis approach to phase I dose-finding design and the modeling of multivariate survival data. In the first part of the dissertation, we apply time to event analysis in an innovative dose-finding design. To account for the unique feature of a new class of oncology drugs, T-cell engagers, we propose a phase I dose-finding method incorporating systematic intra-subject dose escalation. We utilize survival analysis approach to analyze intra-subject dose-escalation data and to identify the maximum tolerated dose. We evaluate the operating characteristics of the proposed design through simulation studies and compare it to existing methodologies. The second part of the dissertation focuses on multivariate survival data with semi-competing risks. Time-to-event data from the same subject are often correlated. In addition, semi-competing risks are sometimes present with correlated events when a terminal event can censor other non-terminal events but not vice versa. We use a semiparametric frailty model to account for the dependence between correlated survival events and semi-competing risks and adopt penalized partial likelihood (PPL) approach for parameter estimation. In addition, we investigate methods for variable selection in semi-parametric frailty models and propose a double penalized partial likelihood (DPPL) procedure for variable selection of fixed effects in frailty models. We consider two penalty functions, least absolute shrinkage and selection operator (LASSO) and smoothly clipped absolute deviation (SCAD) penalty. The proposed methods are evaluated in simulation studies and illustrated using data from Indianapolis-Ibadan Dementia Project.Item Association Between Tobacco Related Diagnoses and Alzheimer Disease: A population Study(2022-05) Almalki, Amwaj Ghazi; Zhang, Pengyue; Johnson, Travis; Fadel, WilliamBackground: Tobacco use is associated with an increased risk of developing Alzheimer's disease (AD). 14% of the incidence of AD is associated with various types of tobacco exposure. Additional real-world evidence is warranted to reveal the association between tobacco use and AD in age/gender-specific subpopulations. Method: In this thesis, the relationships between diagnoses related to tobacco use and diagnoses of AD in gender- and age-specific subgroups were investigated, using health information exchange data. The non-parametric Kaplan-Meier method was used to estimate the incidence of AD. Furthermore, the log-rank test was used to compare incidence between individuals with and without tobacco related diagnoses. In addition, we used semi-parametric Cox models to examine the association between tobacco related diagnoses and diagnoses of AD, while adjusting covariates. Results: Tobacco related diagnosis was associated with increased risk of developing AD comparing to no tobacco related diagnosis among individuals aged 60-74 years (female hazard ratio [HR] =1.26, 95% confidence interval [CI]: 1.07 – 1.48, p-value = 0.005; and male HR =1.33, 95% CI: 1.10 - 1.62, p-value =0.004). Tobacco related diagnosis was associated with decreased risk of developing AD comparing to no tobacco related diagnosis among individuals aged 75-100 years (female HR =0.79, 95% CI: 0.70 - 0.89, p-value =0.001; and male HR =0.90, 95% CI: 0.82 - 0.99, p-value =0.023). Conclusion: Individuals with tobacco related diagnoses were associated with an increased risk of developing AD in older adults aged 60-75 years. Among older adults aged 75-100 years, individuals with tobacco related diagnoses were associated with a decreased risk of developing AD.Item Bayesian Adaptive Designs for Early Phase Clinical Trials(2023-07) Guo, Jiaying; Zang, Yong; Han, Jiali; Zhao, Yi; Ren, JieDelayed toxicity outcomes are common in phase I clinical trials, especially in oncology studies. It causes logistic difficulty, wastes resources, and prolongs the trial duration. We propose the time-to-event 3+3 (T-3+3) design to solve the delayed outcome issue for the 3+3 design. We convert the dose decision rules of the 3+3 design into a series of events. A transparent yet efficient Bayesian probability model is applied to calculate the event happening probabilities in the presence of delayed outcomes, which incorporates the informative pending patients' remaining follow-up time into consideration. The T-3+3 design only models the information for the pending patients and seamlessly reduces to the conventional 3+3 design in the absence of delayed outcomes. We further extend the proposed method to interval 3+3 (i3+3) design, an algorithm-based phase I dose-finding design which is based on simple but more comprehensive rules that account for the variabilities in the observed data. Similarly, the dose escalation/deescalation decision is recommended by comparing the event happening probabilities which are calculated by considering the ratio between the averaged follow-up time for at-risk patients and the total assessment window. We evaluate the operating characteristics of the proposed designs through simulation studies and compare them to existing methods. The umbrella trial is a clinical trial strategy that accommodates the paradigm shift towards personalized medicine, which evaluates multiple investigational drugs in different subgroups of patients with the same disease. A Bayesian adaptive umbrella trial design is proposed to select effective targeted agents for different biomarker-based subgroups of patients. To facilitate treatment evaluation, the design uses a mixture regression model that jointly models short-term and long-term response outcomes. In addition, a data-driven latent class model is employed to adaptively combine subgroups into induced latent classes based on overall data heterogeneities, which improves the statistical power of the umbrella trial. To enhance individual ethics, the design includes a response-adaptive randomization scheme with early stopping rules for futility and superiority. Bayesian posterior probabilities are used to make these decisions. Simulation studies demonstrate that the proposed design outperforms two conventional designs across a range of practical treatment-outcome scenarios.Item Bayesian Adaptive Dose-Finding Clinical Trial Designs with Late-Onset Outcomes(2021-07) Zhang, Yifei; Zhang, Yong; Song, Yiqing; Liu, Hao; Bakoyannis, GiorgosThe late-onset outcome issue is common in early phase dose- nding clinical trials. This problem becomes more intractable in phase I/II clinical trials because both toxicity and e cacy responses are subject to the late-onset outcome issue. The existing methods applying for the phase I trials cannot be used directly for the phase I/II trial due to a lack of capability to model the joint toxicity{e cacy distribution. We propose a conditional weighted likelihood (CWL) method to circumvent this issue. The key idea of the CWL method is to decompose the joint probability into the product of marginal and conditional probabilities and then weight each probability based on each patient's actual follow-up time. We further extend the proposed method to handle more complex situations where the late-onset outcomes are competing risks or semicompeting risks outcomes. We treat the late-onset competing risks/semi-competing risks outcomes as missing data and develop a series of Bayesian data-augmentation methods to e ciently impute the missing data and draw the posterior samples of the parameters of interest. We also propose adaptive dose- nding algorithms to allocate patients and identify the optimal biological dose during the trial. Simulation studies show that the proposed methods yield desirable operating characteristics and outperform the existing methods.Item Bayesian design and analysis of cluster randomized trials(2017-08-07) Xiao, Shan; Tu, Wanzhu; Liu, ZiyueCluster randomization is frequently used in clinical trials for convenience of inter ventional implementation and for reducing the risk of contamination. The opera tional convenience of cluster randomized trials, however, is gained at the expense of reduced analytical power. Compared to individually randomized studies, cluster randomized trials often have a much-reduced power. In this dissertation, I consider ways of enhancing analytical power with historical trial data. Specifically, I introduce a hierarchical Bayesian model that is designed to incorporate available information from previous trials of the same or similar interventions. Operationally, the amount of information gained from the previous trials is determined by a Kullback-Leibler divergence measure that quantifies the similarity, or lack thereof, between the histor ical and current trial data. More weight is given to the historical data if they more closely resemble the current trial data. Along this line, I examine the Type I error rates and analytical power associated with the proposed method, in comparison with the existing methods without utilizing the ancillary historical information. Similarly, to design a cluster randomized trial, one could estimate the power by simulating trial data and comparing them with the historical data from the published studies. Data analytical and power simulation methods are developed for more general situations of cluster randomized trials, with multiple arms and multiple types of data following the exponential family of distributions. An R package is developed for practical use of the methods in data analysis and trial design.Item Building Prediction Models for Dementia: The Need to Account for Interval Censoring and the Competing Risk of Death(2019-08) Marchetti, Arika L.; Bakoyannis, Giorgos; Li, Xiaochun; Gao, Sujuan; Yiannoutsos, ConstantinContext. Prediction models for dementia are crucial for informing clinical decision making in older adults. Previous models have used genotype and age to obtain risk scores to determine risk of Alzheimer’s Disease, one of the most common forms of dementia (Desikan et al., 2017). However, previous prediction models do not account for the fact that the time to dementia onset is unknown, lying between the last negative and the first positive dementia diagnosis time (interval censoring). Instead, these models use time to diagnosis, which is greater than or equal to the true dementia onset time. Furthermore, these models do not account for the competing risk of death which is quite frequent among elder adults. Objectives. To develop a prediction model for dementia that accounts for interval censoring and the competing risk of death. To compare the predictions from this model with the predictions from a naïve analysis that ignores interval censoring and the competing risk of death. Methods. We apply the semiparametric sieve maximum likelihood (SML) approach to simultaneously model the cumulative incidence function (CIF) of dementia and death while accounting for interval censoring (Bakoyannis, Yu, & Yiannoutsos, 2017). The SML is implemented using the R package intccr. The CIF curves of dementia are compared for the SML and the naïve approach using a dataset from the Indianapolis Ibadan Dementia Project. Results. The CIF from the SML and the naïve approach illustrated that for healthier individuals at baseline, the naïve approach underestimated the incidence of dementia compared to the SML, as a result of interval censoring. Individuals with a poorer health condition at baseline have a CIF that appears to be overestimated in the naïve approach. This is due to older individuals with poor health conditions having an elevated risk of death. Conclusions. The SML method that accounts for the competing risk of death along with interval censoring should be used for fitting prediction/prognostic models of dementia to inform clinical decision making in older adults. Without controlling for the competing risk of death and interval censoring, the current models can provide invalid predictions of the CIF of dementia.Item classCleaner: A Quantitative Method for Validating Peptide Identification in LC-MS/MS Workflows(2020-05) Key, Melissa Chester; Boukai, Benzion; Ragg, Susanne; Katz, Barry; Mosley, AmberBecause label-free liquid chromatography-tandem mass spectrometry (LC-MS/MS) shotgun proteomics infers the peptide sequence of each measurement, there is inherent uncertainty in the identity of each peptide and its originating protein. Removing misidentified peptides can improve the accuracy and power of downstream analyses when differences between proteins are of primary interest. In this dissertation I present classCleaner, a novel algorithm designed to identify misidentified peptides from each protein using the available quantitative data. The algorithm is based on the idea that distances between peptides belonging to the same protein are stochastically smaller than those between peptides in different proteins. The method first determines a threshold based on the estimated distribution of these two groups of distances. This is used to create a decision rule for each peptide based on counting the number of within-protein distances smaller than the threshold. Using simulated data, I show that classCleaner always reduces the proportion of misidentified peptides, with better results for larger proteins (by number of constituent peptides), smaller inherent misidentification rates, and larger sample sizes. ClassCleaner is also applied to a LC-MS/MS proteomics data set and the Congressional Voting Records data set from the UCI machine learning repository. The later is used to demonstrate that the algorithm is not specific to proteomics.Item Contemporary Outcomes of Distal Lower Extremity Bypass for Chronic Limb Threatening Ischemia and a Model Based Comparison with Non-surgical Therapies(2021-03) Leckie, Katherin; Bakoyannis, Giorgos; Yiannoutsos, Constantin; Murphy, MichaelObjective: Gold standard therapy for chronic limb threatening ischemia (CLTI) is revascularization but in patients in whom below-the-knee bypass is indicated autologous vein conduit may not be available. Contemporary outcomes of distal bypass with suboptimal conduits have not been well described and recent advances in non-surgical therapies raise the question of whether in some cases there is evidence that these should be considered. Methods: Data was obtained from the Vascular Quality Initiative (VQI) registry as well as from a multi-center, randomized clinical trial of cell therapy. Incidence of major amputation after distal bypass was estimated for the VQI cohort by conduit type using non-parametric survival analysis with death as a competing risk. A cox proportional hazards model was then fit to the pooled data in a stepwise fashion with death as a competing risk, including evaluations for appropriate transformation, time dependency and interactions for each included covariate, and hazard ratios were estimated for the risk of major amputation by treatment. Results: At 365 days, the estimated cumulative incidence of major amputation with death as a competing risk is 25% after distal bypass with non-autologous biologic conduit (0.2499, 95% CI 0.2242 - 0.2785), 13% for prosthetic (0.1276, 95% CI 0.1172 - 0.1389) and 9% for GSV (0.0900, 95% CI 0.0848 - 0.0956). The cox proportional hazards model found a significant interaction between age and treatment. Compared to bypass with non-autogenous biologic, the hazard ratios for bypass with GSV were 0.41 (p<0.0001), 0.41 (p<0.0001), 0.42 (p<0.0001) and 0.42 (p<0.0001) respectively at ages 55, 60, 65 and 70 and for bypass with prosthetic were 0.68 (p=0.0043), 0.67 (p=0.0004), 0.65 (p<0.0001) and 0.64 (p<0.0001) respectively and for autologous cell therapy 0.22 (p=0.0005), 0.34 (p=0.0011), 0.52 (p=0.0196) and 0.76 (p=0.3677) respectively. No significant differences were found between best medical management and distal bypass with non-autologous biologic. Conclusion: The risk of major amputation after distal bypass is lowest in patients with GSV conduit and highest following bypass with non-autologous biologic. Using a semi-parametric model, cell therapy was estimated to significantly decrease the risk of amputation compared to distal bypass with non-autologous biologic conduit in younger patients.