Selection of variables and dimension reduction in high-dimensional non-parametric regression

7 November 2008

Abstract

We consider a $l_1$ -penalization procedure in the non-parametric Gaussian regression model. In many concrete examples, the dimension $d$ of the input variable $X$ is very large (sometimes depending on the number of observations). Estimation of a $\beta$ -regular regression function $f$ cannot be faster than the slow rate $n^{-2\beta/(2\beta+d)}$ . Hopefully, in some situations, $f$ depends only on a few numbers of the coordinates of $X$ . In this paper, we construct two procedures. The first one selects, with high probability, these coordinates. Then, using this subset selection method, we run a local polynomial estimator (on the set of interesting coordinates) to estimate the regression function at the rate $n^{-2\beta/(2\beta+d^*)}$ , where $d^*$ , the "real" dimension of the problem (exact number of variables whom $f$ depends on), has replaced the dimension $d$ of the design. To achieve this result, we used a $l_1$ penalization method in this non-parametric setup.

View on arXiv

Comments on this paper