A-Optimal Subsampling For Big Data General Estimating Equations

dc.contributor.advisorPeng, Hanxiang
dc.contributor.advisorRubchinsky, Leonid
dc.contributor.authorCheung, Chung Ching
dc.contributor.otherBoukai, Benzion
dc.contributor.otherLin, Guang
dc.contributor.otherAl Hasan, Mohammad
dc.date.accessioned2019-07-30T12:41:37Z
dc.date.available2019-07-30T12:41:37Z
dc.date.issued2019-08
dc.degree.date2019en_US
dc.degree.disciplineMathematical Sciencesen
dc.degree.grantorPurdue Universityen_US
dc.degree.levelPh.D.en_US
dc.descriptionIndiana University-Purdue University Indianapolis (IUPUI)en_US
dc.description.abstractA significant hurdle for analyzing big data is the lack of effective technology and statistical inference methods. A popular approach for analyzing data with large sample is subsampling. Many subsampling probabilities have been introduced in literature (Ma, \emph{et al.}, 2015) for linear model. In this dissertation, we focus on generalized estimating equations (GEE) with big data and derive the asymptotic normality for the estimator without resampling and estimator with resampling. We also give the asymptotic representation of the bias of estimator without resampling and estimator with resampling. we show that bias becomes significant when the data is of high-dimensional. We also present a novel subsampling method called A-optimal which is derived by minimizing the trace of some dispersion matrices (Peng and Tan, 2018). We derive the asymptotic normality of the estimator based on A-optimal subsampling methods. We conduct extensive simulations on large sample data with high dimension to evaluate the performance of our proposed methods using MSE as a criterion. High dimensional data are further investigated and we show through simulations that minimizing the asymptotic variance does not imply minimizing the MSE as bias not negligible. We apply our proposed subsampling method to analyze a real data set, gas sensor data which has more than four millions data points. In both simulations and real data analysis, our A-optimal method outperform the traditional uniform subsampling method.en_US
dc.identifier.urihttps://hdl.handle.net/1805/20022
dc.identifier.urihttp://dx.doi.org/10.7912/C2/2408
dc.language.isoen_USen_US
dc.subjectSubsamplingen_US
dc.subjectBig Dataen_US
dc.subjectA-optimalityen_US
dc.subjectGeneral Estimating Equationsen_US
dc.subjectHigh Dimensional Statisticsen_US
dc.titleA-Optimal Subsampling For Big Data General Estimating Equationsen_US
dc.typeThesisen
Files
Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Purdue_University_Thesis_Chung_Ching_Cheung.pdf
Size:
1.61 MB
Format:
Adobe Portable Document Format
Description:
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.99 KB
Format:
Item-specific license agreed upon to submission
Description: