A provable smoothing approach for high dimensional generalized regression with applications in genomics

dc.contributor.authorHan, Fang
dc.contributor.authorJi, Hongkai
dc.contributor.authorJi, Zhicheng
dc.contributor.authorWang, Honglang
dc.contributor.departmentMathematical Sciences, School of Scienceen_US
dc.date.accessioned2018-06-29T21:23:35Z
dc.date.available2018-06-29T21:23:35Z
dc.date.issued2017
dc.description.abstractIn many applications, linear models fit the data poorly. This article studies an appealing alternative, the generalized regression model. This model only assumes that there exists an unknown monotonically increasing link function connecting the response YYY to a single index XTβ∗XTβ∗\boldsymbol{X} ^{\mathsf{T}}\boldsymbol{\beta } ^{*} of explanatory variables X∈RdX∈Rd\boldsymbol{X} \in{\mathbb{R}} ^{d}. The generalized regression model is flexible and covers many widely used statistical models. It fits the data generating mechanisms well in many real problems, which makes it useful in a variety of applications where regression models are regularly employed. In low dimensions, rank-based M-estimators are recommended to deal with the generalized regression model, giving root-nnn consistent estimators of β∗β∗\boldsymbol{\beta } ^{*}. Applications of these estimators to high dimensional data, however, are questionable. This article studies, both theoretically and practically, a simple yet powerful smoothing approach to handle the high dimensional generalized regression model. Theoretically, a family of smoothing functions is provided, and the amount of smoothing necessary for efficient inference is carefully calculated. Practically, our study is motivated by an important and challenging scientific problem: decoding gene regulation by predicting transcription factors that bind to cis-regulatory elements. Applying our proposed method to this problem shows substantial improvement over the state-of-the-art alternative in real data.en_US
dc.eprint.versionFinal published versionen_US
dc.identifier.citationHan, F., Ji, H., Ji, Z., & Wang, H. (2017). A provable smoothing approach for high dimensional generalized regression with applications in genomics. Electronic Journal of Statistics, 11(2), 4347–4403. https://doi.org/10.1214/17-EJS1352en_US
dc.identifier.issn1935-7524en_US
dc.identifier.urihttps://hdl.handle.net/1805/16648
dc.language.isoen_USen_US
dc.publisherInstitute of Mathematical Statisticsen_US
dc.relation.isversionof10.1214/17-EJS1352en_US
dc.relation.journalElectronic Journal of Statisticsen_US
dc.rightsAttribution 3.0 United States
dc.rights.urihttp://creativecommons.org/licenses/by/3.0/us/
dc.sourceOtheren_US
dc.subjectSemiparametric regressionen_US
dc.subjectgeneralized regression modelen_US
dc.subjectrank-based M-estimatoren_US
dc.subjectsmoothing approximationen_US
dc.subjecttranscription factor bindingen_US
dc.titleA provable smoothing approach for high dimensional generalized regression with applications in genomicsen_US
dc.typeArticleen_US
Files
Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
euclid.ejs.1510801790.pdf
Size:
1.06 MB
Format:
Adobe Portable Document Format
Description:
Article
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.99 KB
Format:
Item-specific license agreed upon to submission
Description: