The out-of-sample prediction error of the square-root-LASSO and related estimators

We study the classical problem of predicting an outcome variable, , using a linear combination of a -dimensional covariate vector, . We are interested in linear predictors whose coefficients solve: % \begin{align*} \inf_{\boldsymbol{\beta} \in \mathbb{R}^d} \left( \mathbb{E}_{\mathbb{P}_n} \left[ \left(Y-\mathbf{X}^{\top}\beta \right)^r \right] \right)^{1/r} +\delta \, \rho\left(\boldsymbol{\beta}\right), \end{align*} where is a regularization parameter, is a convex penalty function, is the empirical distribution of the data, and . We present three sets of new results. First, we provide conditions under which linear predictors based on these estimators % solve a \emph{distributionally robust optimization} problem: they minimize the worst-case prediction error over distributions that are close to each other in a type of \emph{max-sliced Wasserstein metric}. Second, we provide a detailed finite-sample and asymptotic analysis of the statistical properties of the balls of distributions over which the worst-case prediction error is analyzed. Third, we use the distributionally robust optimality and our statistical analysis to present i) an oracle recommendation for the choice of regularization parameter, , that guarantees good out-of-sample prediction error; and ii) a test-statistic to rank the out-of-sample performance of two different linear estimators. None of our results rely on sparsity assumptions about the true data generating process; thus, they broaden the scope of use of the square-root lasso and related estimators in prediction problems.
View on arXiv