70
50

Selection of variables and dimension reduction in high-dimensional non-parametric regression

Abstract

We consider a l1l_1-penalization procedure in the non-parametric Gaussian regression model. In many concrete examples, the dimension dd of the input variable XX is very large (sometimes depending on the number of observations). Estimation of a β\beta-regular regression function ff cannot be faster than the slow rate n2β/(2β+d)n^{-2\beta/(2\beta+d)}. Hopefully, in some situations, ff depends only on a few numbers of the coordinates of XX. In this paper, we construct two procedures. The first one selects, with high probability, these coordinates. Then, using this subset selection method, we run a local polynomial estimator (on the set of interesting coordinates) to estimate the regression function at the rate n2β/(2β+d)n^{-2\beta/(2\beta+d^*)}, where dd^*, the "real" dimension of the problem (exact number of variables whom ff depends on), has replaced the dimension dd of the design. To achieve this result, we used a l1l_1 penalization method in this non-parametric setup.

View on arXiv
Comments on this paper