Sample Size Determination in Multivariate Parameters With Applications to Nonuniform Subsampling in Big Data High Dimensional Linear Regression

Date
2021-12
Authors
Language
American English
Embargo Lift Date
Department
Committee Chair
Degree
Ph.D.
Degree Year
2021
Department
Mathematical Sciences
Grantor
Purdue University
Journal Title
Journal ISSN
Volume Title
Found At
Abstract

Subsampling is an important method in the analysis of Big Data. Subsample size determination (SSSD) plays a crucial part in extracting information from data and in breaking the challenges resulted from huge data sizes. In this thesis, (1) Sample size determination (SSD) is investigated in multivariate parameters, and sample size formulas are obtained for multivariate normal distribution. (2) Sample size formulas are obtained based on concentration inequalities. (3) Improved bounds for McDiarmid’s inequalities are obtained. (4) The obtained results are applied to nonuniform subsampling in Big Data high dimensional linear regression. (5) Numerical studies are conducted.

The sample size formula in univariate normal distribution is a melody in elementary statistics. It appears that its generalization to multivariate normal (or more generally multivariate parameters) hasn’t been caught much attention to the best of our knowledge. In this thesis, we introduce a definition for SSD, and obtain explicit formulas for multivariate normal distribution, in gratifying analogy of the sample size formula in univariate normal.

Commonly used concentration inequalities provide exponential rates, and sample sizes based on these inequalities are often loose. Talagrand (1995) provided the missing factor to sharpen these inequalities. We obtained the numeric values of the constants in the missing factor and slightly improved his results. Furthermore, we provided the missing factor in McDiarmid’s inequality. These improved bounds are used to give shrunken sample sizes.

Description
Indiana University-Purdue University Indianapolis (IUPUI)
item.page.description.tableofcontents
item.page.relation.haspart
Cite As
ISSN
Publisher
Series/Report
Sponsorship
Major
Extent
Identifier
Relation
Journal
Source
Alternative Title
Type
Thesis
Number
Volume
Conference Dates
Conference Host
Conference Location
Conference Name
Conference Panel
Conference Secretariat Location
Version
Full Text Available at
This item is under embargo {{howLong}}