Closed-form norm scaling with data for overparameterized linear regression and diagonal linear networks under bias

For overparameterized linear regression with isotropic Gaussian design and minimum- interpolator , we give a unified, high-probability characterization for the scaling of the family of parameter norms $ \\{ \lVert \widehat{w_p} \rVert_r \\}_{r \in [1,p]} $ with sample size.We solve this basic, but unresolved question through a simple dual-ray analysis, which reveals a competition between a signal *spike* and a *bulk* of null coordinates in , yielding closed-form predictions for (i) a data-dependent transition (the "elbow"), and (ii) a universal threshold that separates 's which plateau from those that continue to grow with an explicit exponent.This unified solution resolves the scaling of *all* norms within the family under -biased interpolation, and explains in one picture which norms saturate and which increase as grows.We then study diagonal linear networks (DLNs) trained by gradient descent. By calibrating the initialization scale to an effective via the DLN separable potential, we show empirically that DLNs inherit the same elbow/threshold laws, providing a predictive bridge between explicit and implicit bias.Given that many generalization proxies depend on , our results suggest that their predictive power will depend sensitively on which norm is used.
View on arXiv