- Browse by Author
Browsing by Author "Shang, Zuofeng"
Now showing 1 - 6 of 6
Results Per Page
Sort Options
Item Computational Limits of A Distributed Algorithm for Smoothing Spline(2017) Shang, Zuofeng; Cheng, Guang; Mathematical Sciences, School of ScienceIn this paper, we explore statistical versus computational trade-off to address a basic question in the application of a distributed algorithm: what is the minimal computational cost in obtaining statistical optimality? In smoothing spline setup, we observe a phase transition phenomenon for the number of deployed machines that ends up being a simple proxy for computing cost. Specifically, a sharp upper bound for the number of machines is established: when the number is below this bound, statistical optimality (in terms of nonparametric estimation or testing) is achievable; otherwise, statistical optimality becomes impossible. These sharp bounds partly capture intrinsic computational limits of the distributed algorithm considered in this paper, and turn out to be fully determined by the smoothness of the regression function. As a side remark, we argue that sample splitting may be viewed as an alternative form of regularization, playing a similar role as smoothing parameter.Item An MM algorithm for estimation of a two component semiparametric density mixture with a known component(IMS, 2018) Shen, Zhou; Levine, Michael; Shang, Zuofeng; Mathematical Sciences, School of ScienceWe consider a semiparametric mixture of two univariate density functions where one of them is known while the weight and the other function are unknown. We do not assume any additional structure on the unknown density function. For this mixture model, we derive a new sufficient identifiability condition and pinpoint a specific class of distributions describing the unknown component for which this condition is mostly satisfied. We also suggest a novel approach to estimation of this model that is based on an idea of applying a maximum smoothed likelihood to what would otherwise have been an ill-posed problem. We introduce an iterative MM (Majorization-Minimization) algorithm that estimates all of the model parameters. We establish that the algorithm possesses a descent property with respect to a log-likelihood objective functional and prove that the algorithm, indeed, converges. Finally, we also illustrate the performance of our algorithm in a simulation study and apply it to a real dataset.Item Optimal nonparametric inference via deep neural network(Elsevier, 2022-01) Liu, Ruiqi; Boukai, Ben; Shang, Zuofeng; Mathematical Sciences, School of ScienceDeep neural network is a state-of-art method in modern science and technology. Much statistical literature have been devoted to understanding its performance in nonparametric estimation, whereas the results are suboptimal due to a redundant logarithmic sacrifice. In this paper, we show that such log-factors are not necessary. We derive upper bounds for the L2 minimax risk in nonparametric estimation. Sufficient conditions on network architectures are provided such that the upper bounds become optimal (without log-sacrifice). Our proof relies on an explicitly constructed network estimator based on tensor product B-splines. We also derive asymptotic distributions for the constructed network and a relating hypothesis testing procedure. The testing procedure is further proved as minimax optimal under suitable network architectures.Item Optimal Tuning for Divide-and-conquer Kernel Ridge Regression with Massive Data(2018) Xu, Ganggang; Shang, Zuofeng; Cheng, Guang; Mathematical Sciences, School of ScienceDivide-and-conquer is a powerful approach for large and massive data analysis. In the nonparameteric regression setting, although various theoretical frameworks have been established to achieve optimality in estimation or hypothesis testing, how to choose the tuning parameter in a practically effective way is still an open problem. In this paper, we propose a data-driven procedure based on divide-and-conquer for selecting the tuning parameters in kernel ridge regression by modifying the popular Generalized Cross-validation (GCV, Wahba, 1990). While the proposed criterion is computationally scalable for massive data sets, it is also shown under mild conditions to be asymptotically optimal in the sense that minimizing the proposed distributed-GCV (dGCV) criterion is equivalent to minimizing the true global conditional empirical loss of the averaged function estimator, extending the existing optimality results of GCV to the divide-and-conquer framework.Item Statistical Inference on Panel Data Models: A Kernel Ridge Regression Method(Taylor & Francis, 2021) Zhao, Shunan; Liu, Ruiqi; Shang, Zuofeng; Mathematical Sciences, School of ScienceWe propose statistical inferential procedures for nonparametric panel data models with interactive fixed effects in a kernel ridge regression framework. Compared with the traditional sieve methods, our method is automatic in the sense that it does not require the choice of basis functions and truncation parameters. The model complexity is controlled by a continuous regularization parameter which can be automatically selected by the generalized cross-validation. Based on the empirical process theory and functional analysis tools, we derive the joint asymptotic distributions for the estimators in the heterogeneous setting. These joint asymptotic results are then used to construct the confidence intervals for the regression means and the prediction intervals for future observations, both being the first provably valid intervals in literature. The marginal asymptotic normality of the functional estimators in a homogeneous setting is also obtained. Our estimators can also be readily modified and applied to other widely used semiparametric models, such as partially linear models. Simulation and real data analyses demonstrate the advantages of our method.Item Variance Change Point Detection Under a Smoothly-Changing Mean Trend with Application to Liver Procurement(Taylor & Francis, 2018) Gao, Zhenguo; Shang, Zuofeng; Du, Pang; Mathematical Sciences, School of ScienceLiterature on change point analysis mostly requires a sudden change in the data distribution, either in a few parameters or the distribution as a whole. We are interested in the scenario, where the variance of data may make a significant jump while the mean changes in a smooth fashion. The motivation is a liver procurement experiment monitoring organ surface temperature. Blindly applying the existing methods to the example can yield erroneous change point estimates since the smoothly changing mean violates the sudden-change assumption. We propose a penalized weighted least-squares approach with an iterative estimation procedure that integrates variance change point detection and smooth mean function estimation. The procedure starts with a consistent initial mean estimate ignoring the variance heterogeneity. Given the variance components the mean function is estimated by smoothing splines as the minimizer of the penalized weighted least squares. Given the mean function, we propose a likelihood ratio test statistic for identifying the variance change point. The null distribution of the test statistic is derived together with the rates of convergence of all the parameter estimates. Simulations show excellent performance of the proposed method. Application analysis offers numerical support to non invasive organ viability assessment by surface temperature monitoring. Supplementary materials for this article are available online.