A provable smoothing approach for high dimensional generalized regression with applications in genomics

Date
2017
Language
American English
Embargo Lift Date
Committee Members
Degree
Degree Year
Department
Grantor
Journal Title
Journal ISSN
Volume Title
Found At
Institute of Mathematical Statistics
Abstract

In many applications, linear models fit the data poorly. This article studies an appealing alternative, the generalized regression model. This model only assumes that there exists an unknown monotonically increasing link function connecting the response YYY to a single index XTβ∗XTβ∗\boldsymbol{X} ^{\mathsf{T}}\boldsymbol{\beta } ^{} of explanatory variables X∈RdX∈Rd\boldsymbol{X} \in{\mathbb{R}} ^{d}. The generalized regression model is flexible and covers many widely used statistical models. It fits the data generating mechanisms well in many real problems, which makes it useful in a variety of applications where regression models are regularly employed. In low dimensions, rank-based M-estimators are recommended to deal with the generalized regression model, giving root-nnn consistent estimators of β∗β∗\boldsymbol{\beta } ^{}. Applications of these estimators to high dimensional data, however, are questionable. This article studies, both theoretically and practically, a simple yet powerful smoothing approach to handle the high dimensional generalized regression model. Theoretically, a family of smoothing functions is provided, and the amount of smoothing necessary for efficient inference is carefully calculated. Practically, our study is motivated by an important and challenging scientific problem: decoding gene regulation by predicting transcription factors that bind to cis-regulatory elements. Applying our proposed method to this problem shows substantial improvement over the state-of-the-art alternative in real data.

Description
item.page.description.tableofcontents
item.page.relation.haspart
Cite As
Han, F., Ji, H., Ji, Z., & Wang, H. (2017). A provable smoothing approach for high dimensional generalized regression with applications in genomics. Electronic Journal of Statistics, 11(2), 4347–4403. https://doi.org/10.1214/17-EJS1352
ISSN
1935-7524
Publisher
Series/Report
Sponsorship
Major
Extent
Identifier
Relation
Journal
Electronic Journal of Statistics
Source
Other
Alternative Title
Type
Article
Number
Volume
Conference Dates
Conference Host
Conference Location
Conference Name
Conference Panel
Conference Secretariat Location
Version
Final published version
Full Text Available at
This item is under embargo {{howLong}}