201
v1v2v3 (latest)

Closed-form r\ell_r norm scaling with data for overparameterized linear regression and diagonal linear networks under p\ell_p bias

Main:9 Pages
20 Figures
Bibliography:4 Pages
Appendix:28 Pages
Abstract

For overparameterized linear regression with isotropic Gaussian design and minimum-p\ell_p interpolator p(1,2]p\in(1,2], we give a unified, high-probability characterization for the scaling of the family of parameter norms $ \\{ \lVert \widehat{w_p} \rVert_r \\}_{r \in [1,p]} $ with sample size.We solve this basic, but unresolved question through a simple dual-ray analysis, which reveals a competition between a signal *spike* and a *bulk* of null coordinates in XYX^\top Y, yielding closed-form predictions for (i) a data-dependent transition nn_\star (the "elbow"), and (ii) a universal threshold r=2(p1)r_\star=2(p-1) that separates wp^r\lVert \widehat{w_p} \rVert_r's which plateau from those that continue to grow with an explicit exponent.This unified solution resolves the scaling of *all* r\ell_r norms within the family r[1,p]r\in [1,p] under p\ell_p-biased interpolation, and explains in one picture which norms saturate and which increase as nn grows.We then study diagonal linear networks (DLNs) trained by gradient descent. By calibrating the initialization scale α\alpha to an effective peff(α)p_{\mathrm{eff}}(\alpha) via the DLN separable potential, we show empirically that DLNs inherit the same elbow/threshold laws, providing a predictive bridge between explicit and implicit bias.Given that many generalization proxies depend on wp^r\lVert \widehat {w_p} \rVert_r, our results suggest that their predictive power will depend sensitively on which lrl_r norm is used.

View on arXiv
Comments on this paper