Optimal Tuning for Divide-and-conquer Kernel Ridge Regression with Massive Data

Date
2018
Language
English
Embargo Lift Date
Committee Members
Degree
Degree Year
Department
Grantor
Journal Title
Journal ISSN
Volume Title
Found At
Abstract

Divide-and-conquer is a powerful approach for large and massive data analysis. In the nonparameteric regression setting, although various theoretical frameworks have been established to achieve optimality in estimation or hypothesis testing, how to choose the tuning parameter in a practically effective way is still an open problem. In this paper, we propose a data-driven procedure based on divide-and-conquer for selecting the tuning parameters in kernel ridge regression by modifying the popular Generalized Cross-validation (GCV, Wahba, 1990). While the proposed criterion is computationally scalable for massive data sets, it is also shown under mild conditions to be asymptotically optimal in the sense that minimizing the proposed distributed-GCV (dGCV) criterion is equivalent to minimizing the true global conditional empirical loss of the averaged function estimator, extending the existing optimality results of GCV to the divide-and-conquer framework.

Description
item.page.description.tableofcontents
item.page.relation.haspart
Cite As
Xu, G., Shang, Z., & Cheng, G. (2018). Optimal Tuning for Divide-and-conquer Kernel Ridge Regression with Massive Data. International Conference on Machine Learning, 5483–5491. Retrieved from http://proceedings.mlr.press/v80/xu18f.html
ISSN
Publisher
Series/Report
Sponsorship
Major
Extent
Identifier
Relation
Journal
International Conference on Machine Learning
Source
Author
Alternative Title
Type
Article
Number
Volume
Conference Dates
Conference Host
Conference Location
Conference Name
Conference Panel
Conference Secretariat Location
Version
Author's manuscript
Full Text Available at
This item is under embargo {{howLong}}