- Browse by Date
Biostatistics Department Theses and Dissertations
Permanent URI for this collection
Browse
Browsing Biostatistics Department Theses and Dissertations by Issue Date
Now showing 1 - 10 of 63
Results Per Page
Sort Options
Item Multivariate semiparametric regression models for longitudinal data(2014) Li, Zhuokai; Tu, Wanzhu; Liu, Hai; Katz, Barry P.; Fortenberry, J. DennisMultiple-outcome longitudinal data are abundant in clinical investigations. For example, infections with different pathogenic organisms are often tested concurrently, and assessments are usually taken repeatedly over time. It is therefore natural to consider a multivariate modeling approach to accommodate the underlying interrelationship among the multiple longitudinally measured outcomes. This dissertation proposes a multivariate semiparametric modeling framework for such data. Relevant estimation and inference procedures as well as model selection tools are discussed within this modeling framework. The first part of this research focuses on the analytical issues concerning binary data. The second part extends the binary model to a more general situation for data from the exponential family of distributions. The proposed model accounts for the correlations across the outcomes as well as the temporal dependency among the repeated measures of each outcome within an individual. An important feature of the proposed model is the addition of a bivariate smooth function for the depiction of concurrent nonlinear and possibly interacting influences of two independent variables on each outcome. For model implementation, a general approach for parameter estimation is developed by using the maximum penalized likelihood method. For statistical inference, a likelihood-based resampling procedure is proposed to compare the bivariate nonlinear effect surfaces across the outcomes. The final part of the dissertation presents a variable selection tool to facilitate model development in practical data analysis. Using the adaptive least absolute shrinkage and selection operator (LASSO) penalty, the variable selection tool simultaneously identifies important fixed effects and random effects, determines the correlation structure of the outcomes, and selects the interaction effects in the bivariate smooth functions. Model selection and estimation are performed through a two-stage procedure based on an expectation-maximization (EM) algorithm. Simulation studies are conducted to evaluate the performance of the proposed methods. The utility of the methods is demonstrated through several clinical applications.Item Advanced Modeling of Longitudinal Spectroscopy Data(2014) Kundu, Madan Gopal; Harezlak, Jaroslaw; Randolph, Timothy W.; Sarkar, Jyotirmoy; Steele, Gregory K.; Yiannoutsos, Constantin T.Magnetic resonance (MR) spectroscopy is a neuroimaging technique. It is widely used to quantify the concentration of important metabolites in a brain tissue. Imbalance in concentration of brain metabolites has been found to be associated with development of neurological impairment. There has been increasing trend of using MR spectroscopy as a diagnosis tool for neurological disorders. We established statistical methodology to analyze data obtained from the MR spectroscopy in the context of the HIV associated neurological disorder. First, we have developed novel methodology to study the association of marker of neurological disorder with MR spectrum from brain and how this association evolves with time. The entire problem fits into the framework of scalar-on-function regression model with individual spectrum being the functional predictor. We have extended one of the existing cross-sectional scalar-on-function regression techniques to longitudinal set-up. Advantage of proposed method includes: 1) ability to model flexible time-varying association between response and functional predictor and (2) ability to incorporate prior information. Second part of research attempts to study the influence of the clinical and demographic factors on the progression of brain metabolites over time. In order to understand the influence of these factors in fully non-parametric way, we proposed LongCART algorithm to construct regression tree with longitudinal data. Such a regression tree helps to identify smaller subpopulations (characterized by baseline factors) with differential longitudinal profile and hence helps us to identify influence of baseline factors. Advantage of LongCART algorithm includes: (1) it maintains of type-I error in determining best split, (2) substantially reduces computation time and (2) applicable even observations are taken at subject-specific time-points. Finally, we carried out an in-depth analysis of longitudinal changes in the brain metabolite concentrations in three brain regions, namely, white matter, gray matter and basal ganglia in chronically infected HIV patients enrolled in HIV Neuroimaging Consortium study. We studied the influence of important baseline factors (clinical and demographic) on these longitudinal profiles of brain metabolites using LongCART algorithm in order to identify subgroup of patients at higher risk of neurological impairment.Item Variable selection and structural discovery in joint models of longitudinal and survival data(2014) He, Zangdong; Tu, Wanzhu; Yu, Zhangsheng; Liu, Hai; Song, YiqingJoint models of longitudinal and survival outcomes have been used with increasing frequency in clinical investigations. Correct specification of fixed and random effects, as well as their functional forms is essential for practical data analysis. However, no existing methods have been developed to meet this need in a joint model setting. In this dissertation, I describe a penalized likelihood-based method with adaptive least absolute shrinkage and selection operator (ALASSO) penalty functions for model selection. By reparameterizing variance components through a Cholesky decomposition, I introduce a penalty function of group shrinkage; the penalized likelihood is approximated by Gaussian quadrature and optimized by an EM algorithm. The functional forms of the independent effects are determined through a procedure for structural discovery. Specifically, I first construct the model by penalized cubic B-spline and then decompose the B-spline to linear and nonlinear elements by spectral decomposition. The decomposition represents the model in a mixed-effects model format, and I then use the mixed-effects variable selection method to perform structural discovery. Simulation studies show excellent performance. A clinical application is described to illustrate the use of the proposed methods, and the analytical results demonstrate the usefulness of the methods.Item Joint models for longitudinal and survival data(2014-07-11) Yang, Lili; Gao, Sujuan; Yu, Menggang; Tu, Wanzhu; Callahan, Christopher M.; Zollinger, TerrellEpidemiologic and clinical studies routinely collect longitudinal measures of multiple outcomes. These longitudinal outcomes can be used to establish the temporal order of relevant biological processes and their association with the onset of clinical symptoms. In the first part of this thesis, we proposed to use bivariate change point models for two longitudinal outcomes with a focus on estimating the correlation between the two change points. We adopted a Bayesian approach for parameter estimation and inference. In the second part, we considered the situation when time-to-event outcome is also collected along with multiple longitudinal biomarkers measured until the occurrence of the event or censoring. Joint models for longitudinal and time-to-event data can be used to estimate the association between the characteristics of the longitudinal measures over time and survival time. We developed a maximum-likelihood method to joint model multiple longitudinal biomarkers and a time-to-event outcome. In addition, we focused on predicting conditional survival probabilities and evaluating the predictive accuracy of multiple longitudinal biomarkers in the joint modeling framework. We assessed the performance of the proposed methods in simulation studies and applied the new methods to data sets from two cohort studies.Item Statistical analysis of clinical trial data using Monte Carlo methods(2014-07-11) Han, Baoguang; Gao, Sujuan; Yu, Menggang; Yu, Zhangsheng; Liu, YunlongIn medical research, data analysis often requires complex statistical methods where no closed-form solutions are available. Under such circumstances, Monte Carlo (MC) methods have found many applications. In this dissertation, we proposed several novel statistical models where MC methods are utilized. For the first part, we focused on semicompeting risks data in which a non-terminal event was subject to dependent censoring by a terminal event. Based on an illness-death multistate survival model, we proposed flexible random effects models. Further, we extended our model to the setting of joint modeling where both semicompeting risks data and repeated marker data are simultaneously analyzed. Since the proposed methods involve high-dimensional integrations, Bayesian Monte Carlo Markov Chain (MCMC) methods were utilized for estimation. The use of Bayesian methods also facilitates the prediction of individual patient outcomes. The proposed methods were demonstrated in both simulation and case studies. For the second part, we focused on re-randomization test, which is a nonparametric method that makes inferences solely based on the randomization procedure used in clinical trials. With this type of inference, Monte Carlo method is often used for generating null distributions on the treatment difference. However, an issue was recently discovered when subjects in a clinical trial were randomized with unbalanced treatment allocation to two treatments according to the minimization algorithm, a randomization procedure frequently used in practice. The null distribution of the re-randomization test statistics was found not to be centered at zero, which comprised power of the test. In this dissertation, we investigated the property of the re-randomization test and proposed a weighted re-randomization method to overcome this issue. The proposed method was demonstrated through extensive simulation studies.Item Single-index regression models(2015-05) Wu, Jingwei; Tu, WanzhuUseful medical indices pose important roles in predicting medical outcomes. Medical indices, such as the well-known Body Mass Index (BMI), Charleson Comorbidity Index, etc., have been used extensively in research and clinical practice, for the quantification of risks in individual patients. However, the development of these indices is challenged; and primarily based on heuristic arguments. Statistically, most medical indices can be expressed as a function of a linear combination of individual variables and fitted by single-index model. Single-index model represents a way to retain latent nonlinear features of the data without the usual complications that come with increased dimensionality. In my dissertation, I propose a single-index model approach to analytically derive indices from observed data; the resulted index inherently correlates with specific health outcomes of interest. The first part of this dissertation discusses the derivation of an index function for the prediction of one outcome using longitudinal data. A cubic-spline estimation scheme for partially linear single-index mixed effect model is proposed to incorporate the within-subject correlations among outcome measures contributed by the same subject. A recursive algorithm based on the optimization of penalized least square estimation equation is derived and is shown to work well in both simulated data and derivation of a new body mass measure for the assessment of hypertension risk in children. The second part of this dissertation extends the single-index model to a multivariate setting. Specifically, a multivariate version of single-index model for longitudinal data is presented. An important feature of the proposed model is the accommodation of both correlations among multivariate outcomes and among the repeated measurements from the same subject via random effects that link the outcomes in a unified modeling structure. A new body mass index measure that simultaneously predicts systolic and diastolic blood pressure in children is illustrated. The final part of this dissertation shows existence, root-n strong consistency and asymptotic normality of the estimators in multivariate single-index model under suitable conditions. These asymptotic results are assessed in finite sample simulation and permit joint inference for all parameters.Item Flexible models of time-varying exposures(2015-05) Wang, Chenkun; Gao, Sujuan; Liu, Hai; Yu, Zhangsheng; Callahan, Christopher M.With the availability of electronic medical records, medication dispensing data offers an unprecedented opportunity for researchers to explore complex relationships among longterm medication use, disease progression and potential side-effects in large patient populations. However, these data also pose challenges to existing statistical models because both medication exposure status and its intensity vary over time. This dissertation focused on flexible models to investigate the association between time-varying exposures and different types of outcomes. First, a penalized functional regression model was developed to estimate the effect of time-varying exposures on multivariate longitudinal outcomes. Second, for survival outcomes, a regression spline based model was proposed in the Cox proportional hazards (PH) framework to compare disease risk among different types of time-varying exposures. Finally, a penalized spline based Cox PH model with functional interaction terms was developed to estimate interaction effect between multiple medication classes. Data from a primary care patient cohort are used to illustrate the proposed approaches in determining the association between antidepressant use and various outcomes.Item Penalized spline modeling of the ex-vivo assays dose-response curves and the HIV-infected patients' bodyweight change(2015-06-05) Sarwat, Samiha; Harezlak, Jaroslaw; Yiannoutsos, Constantin T.; Li, Xiaochun; Wools-Kaloustian, Kara K.A semi-parametric approach incorporates parametric and nonparametric functions in the model and is very useful in situations when a fully parametric model is inadequate. The objective of this dissertation is to extend statistical methodology employing the semi-parametric modeling approach to analyze data in health science research areas. This dissertation has three parts. The first part discusses the modeling of the dose-response relationship with correlated data by introducing overall drug effects in addition to the deviation of each subject-specific curve from the population average. Here, a penalized spline regression method that allows modeling of the smooth dose-response relationship is applied to data in studies monitoring malaria drug resistance through the ex-vivo assays.The second part of the dissertation extends the SiZer map, which is an exploratory and a powerful visualization tool, to detect underlying significant features (increase, decrease, or no change) of the curve at various smoothing levels. Here, Penalized Spline Significant Zero Crossings of Derivatives (PS-SiZer), using a penalized spline regression, is introduced to investigate significant features in correlated data arising from longitudinal settings. The third part of the dissertation applies the proposed PS-SiZer methodology to analyze HIV data. The durability of significant weight change over a period is explored from the PS-SiZer visualization. PS-SiZer is a graphical tool for exploring structures in curves by mapping areas where rate of change is significantly increasing, decreasing, or does not change. PS-SiZer maps provide information about the significant rate of weigh change that occurs in two ART regimens at various level of smoothing. A penalized spline regression model at an optimum smoothing level is applied to obtain an estimated first-time point where weight no longer increases for different treatment regimens.Item Multivariate finite mixture latent trajectory models with application to dementia studies(2015-07-02) Lai, Dongbing; Gao, Sujuan; Xu, Huiping; Foroud, Tatiana M.; Katz, Barry P.; Koller, Daniel L.Dementia studies often collect multiple longitudinal neuropsychological measures in order to examine patients' decline across a number of cognitive domains. Dementia patients have shown considerable heterogeneities in individual trajectories of cognitive decline, with some patients showing rapid decline following diagnoses while others exhibiting slower decline or remain stable for several years. In the first part of this dissertation, a multivariate finite mixture latent trajectory model was proposed to identify longitudinal patterns of cognitive decline in multiple cognitive domains with multiple tests within each domain. The expectation-maximization (EM) algorithm was implemented for parameter estimation and posterior probabilities were estimated based on the model to predict latent class membership. Simulation studies demonstrated satisfactory performance of the proposed approach. In the second part, a simulation study was performed to compare the performance of information-based criteria on the selection of the number of latent classes. Commonly used model selection criteria including the Akaike information criterion (AIC), Bayesian information criterion (BIC), as well as consistent AIC (CAIC), sample adjusted BIC (SABIC) and the integrated classification likelihood criteria (ICLBIC) were included in the comparison. SABIC performed uniformly better in all simulation scenarios and hence was the preferred criterion for our proposed model. In the third part of the dissertation, the multivariate finite mixture latent trajectory model was extended to situations where the true latent class membership was known for a subset of patients. The proposed models were used to analyze data from the Uniform Data Set (UDS) collected from Alzheimer's Disease Centers across the country to identify various cognitive decline patterns among patients with dementia.Item Statistical methods to study heterogeneity of treatment effects(2015-09-25) Taft, Lin H.; Shen, Changyu; Li, Xiaochun; Chen, Peng-Sheng; Wessel, JenniferRandomized studies are designed to estimate the average treatment effect (ATE) of an intervention. Individuals may derive quantitatively, or even qualitatively, different effects from the ATE, which is called the heterogeneity of treatment effect. It is important to detect the existence of heterogeneity in the treatment responses, and identify the different sub-populations. Two corresponding statistical methods will be discussed in this talk: a hypothesis testing procedure and a mixture-model based approach. The hypothesis testing procedure was constructed to test for the existence of a treatment effect in sub-populations. The test is nonparametric, and can be applied to all types of outcome measures. A key innovation of this test is to build stochastic search into the test statistic to detect signals that may not be linearly related to the multiple covariates. Simulations were performed to compare the proposed test with existing methods. Power calculation strategy was also developed for the proposed test at the design stage. The mixture-model based approach was developed to identify and study the sub-populations with different treatment effects from an intervention. A latent binary variable was used to indicate whether or not a subject was in a sub-population with average treatment benefit. The mixture-model combines a logistic formulation of the latent variable with proportional hazards models. The parameters in the mixture-model were estimated by the EM algorithm. The properties of the estimators were then studied by the simulations. Finally, all above methods were applied to a real randomized study in a low ejection fraction population that compared the Implantable Cardioverter Defibrillator (ICD) with conventional medical therapy in reducing total mortality.