New Applications of Spline-Based Learning Algorithms

Zhou, Junyi

New Applications of Spline-Based Learning Algorithms

Files

Zhou_iupui_0104D_10535.pdf (1.33 MB)

Date

2021-10

Authors

Zhou, Junyi

Language

American English

Committee Chair

Tu, Wanzhu

Zhang, Ying

Committee Members

Cao, Sha
Zhang, Chi
Bakoyannis, Giorgos

Degree

Ph.D.

Degree Year

2021

Department

Biostatistics

Grantor

Indiana University

Abstract

Statistical learning methods are a ecting human society and our daily lives in unprecedented ways. Most of these learning methods are motivated by practical applications, and they in turn are being used to solve real-world problems. Although generally accepted principles exist for the development of learning methods, new models and algorithms tend to emerge not as a result of theoretical extensions but as a consequence of the scienti c, technological, and societal needs of the world. In view of application-motivated method development, two classes of statistical learning methods are described: One addressing the needs of precision medicine and the other exploring the underlying longitudinal data structure in an unsupervised manner. A common thread in the two methods is combining spline-based models with learning algorithms to improve analytical accuracy. The challenges in optimizing treatment for individual patients are rst addressed. Specifically, therapeutic optimization must be based on a good causal understanding of the treatment e ects. Furthermore, given the multiple treatment options available, recommendations must be consistent regardless of the reference treatment. To address the issue of inconsistent recommendations in a newer R-learner method, a simplex R-learning algorithm to help select the best treatment for individual patients is presented. The algorithm was tested, and the analytical results of the data from the Systolic Blood Pressure Intervention Trial (SPRINT) are presented. The proposed method provided recommendations consistent with the current clinical guidelines for hypertension treatment. The second part of this dissertation addresses the clustering of longitudinal data with sparse and irregular observations. Through simulation studies, the algorithm is demonstrated to have superior clustering accuracy and numerical e ciency to those of the existing methods. In addition, the algorithm can be easily extended to multiple-outcome longitudinal data with little additional computational cost, and is capable of detecting the correct number of clusters when extremely unbalanced cluster sizes exist. The algorithm was applied to a 12-year multi-site observational study (PREDICT-HD) to investigate the disease progression patterns of Huntington's disease (HD). Finally, an R package, ClusterLong, was developed to provide a tool for the public use of this algorithm. The tool was incorporated into an R Shiny application to allow users unfamiliar with R to access the method.

Description

Indiana University-Purdue University Indianapolis (IUPUI)

Rights

Type

Thesis

Permanent Link

https://hdl.handle.net/1805/26959
http://dx.doi.org/10.7912/C2/2821

Collections

Biostatistics Department Theses and Dissertations

Full item page