On the generalization error of norm penalty linear regression models
- OOD
We study linear regression problems with , convex penalty~, and empirical measure of the data . Well known examples include the square-root lasso, square-root sorted penalization, and penalized least absolute deviations regression. We show that, under benign regularity assumptions on , such procedures naturally provide robust generalization, as the problem can be reformulated as a distributionally robust optimization (DRO) problem for a type of max-sliced Wasserstein ball , i.e. solves the linear regression problem iff it solves Our proof of this result is constructive: it identifies the worst-case measure in the DRO problem, which is given by an additive perturbation of . We argue that are the natural balls to consider in this framework, as they provide a computationally efficient procedure comparable to non-robust methods and optimal robustness guarantees. In fact, our generalization bounds are of order , up to logarithmic factors, and thus do not suffer from the curse of dimensionality as is the case for known generalization bounds when using the Wasserstein metric on . Moreover, the bounds provide theoretical support for recommending a regularization parameter of the same order for the linear regression problem.
View on arXiv