Pivotal Estimation of Nonparametric Functions via Square-root Lasso
In a nonparametric linear regression model we study a variant of LASSO, called square-root LASSO, which does not require the knowledge of the scaling parameter of the noise or bounds for it. This work derives new finite sample upper bounds for prediction norm rate of convergence, -rate of converge, -rate of convergence, and sparsity of the square-root LASSO estimator. A lower bound for the prediction norm rate of convergence is also established. In many non-Gaussian noise cases, we rely on moderate deviation theory for self-normalized sums and on new data-dependent empirical process inequalities to achieve Gaussian-like results provided log p = o(n^{1/3}) improving upon results derived in the parametric case that required log p = O(log n). In addition, we derive finite sample bounds on the performance of ordinary least square (OLS) applied tom the model selected by square-root LASSO accounting for possible misspecification of the selected model. In particular, we provide mild conditions under which the rate of convergence of OLS post square-root LASSO is not worse than square-root LASSO. We also study two extreme cases: parametric noiseless and nonparametric unbounded variance. Square-root LASSO does have interesting theoretical guarantees for these two extreme cases. For the parametric noiseless case, differently than LASSO, square-root LASSO is capable of exact recovery. In the unbounded variance case it can still be consistent since its penalty choice does not depend on . Finally, we conduct Monte carlo experiments which show that the empirical performance of square-root LASSO is very similar to the performance of LASSO when is known. We also emphasize that square-root LASSO can be formulated as a convex programming problem and its computation burden is similar to LASSO.
View on arXiv