Sample Size Determination for Subsampling in the Analysis of Big Data, Multiplicative Models for Confidence Intervals and Free-Knot Changepoint Models

Zhang, Sheng

Sample Size Determination for Subsampling in the Analysis of Big Data, Multiplicative Models for Confidence Intervals and Free-Knot Changepoint Models

Files

zhang-thesis-final.pdf (1.29 MB)

Date

2024-05

Authors

Zhang, Sheng

Language

American English

Committee Chair

Peng, Hanxiang

Committee Members

Tan, Fei
Sarkar, Jyoti
Boukai, Ben

Degree

Ph.D.

Degree Year

2024

Department

Mathematical Sciences

Grantor

Purdue University

Abstract

The dissertation consists of three parts. Motivated by subsampling in the analysis of Big Data and by data-splitting in machine learning, sample size determination for multidimensional parameters is presented in the first part.

In the second part, we propose a novel approach to the construction of confidence intervals based on improved concentration inequalities. We provide the missing factor for the tail probability of a random variable which generalizes Talagrand’s (1995) result of the missing factor in Hoeffding’s inequalities. We give the procedure for constructing confidence intervals and illustrate it with simulations.

In the third part, we study irregular change-point models using free-knot splines. The consistency and asymptotic normality of the least squares estimators are proved for the irregular models in which the linear spline is not differentiable. Simulations are carried out to explore the numerical properties of the proposed models. The results are used to analyze the US Covid-19 data.

Description

Indiana University-Purdue University Indianapolis (IUPUI)

Keywords

Big Data, Subsample, A-optimal, Changepoint Model

Rights

Type

Thesis

Permanent Link

https://hdl.handle.net/1805/41432
https://doi.org/10.7912/0KQ6-F914

Collections

Mathematical Sciences Department Theses and Dissertations

Full item page