ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2006.05800
351
143
v1v2v3v4 (latest)

On the Optimal Weighted ℓ2\ell_2ℓ2​ Regularization in Overparameterized Linear Regression

Neural Information Processing Systems (NeurIPS), 2020
10 June 2020
Denny Wu
Ji Xu
ArXiv (abs)PDFHTML
Abstract

We consider the linear model y=Xβ⋆+ϵ\mathbf{y} = \mathbf{X} \mathbf{\beta}_\star + \mathbf{\epsilon}y=Xβ⋆​+ϵ with X∈Rn×p\mathbf{X}\in \mathbb{R}^{n\times p}X∈Rn×p in the overparameterized regime p>np>np>n. We estimate β⋆\mathbf{\beta}_\starβ⋆​ via generalized (weighted) ridge regression: β^λ=(XTX+λΣw)†XTy\hat{\mathbf{\beta}}_\lambda = \left(\mathbf{X}^T\mathbf{X} + \lambda \mathbf{\Sigma}_w\right)^\dagger \mathbf{X}^T\mathbf{y}β^​λ​=(XTX+λΣw​)†XTy, where Σw\mathbf{\Sigma}_wΣw​ is the weighting matrix. Under a random design setting with general data covariance Σx\mathbf{\Sigma}_xΣx​ and anisotropic prior on the true coefficients Eβ⋆β⋆T=Σβ\mathbb{E}\mathbf{\beta}_\star\mathbf{\beta}_\star^T = \mathbf{\Sigma}_\betaEβ⋆​β⋆T​=Σβ​, we provide an exact characterization of the prediction risk E(y−xTβ^λ)2\mathbb{E}(y-\mathbf{x}^T\hat{\mathbf{\beta}}_\lambda)^2E(y−xTβ^​λ​)2 in the proportional asymptotic limit p/n→γ∈(1,∞)p/n\rightarrow \gamma \in (1,\infty)p/n→γ∈(1,∞). Our general setup leads to a number of interesting findings. We outline precise conditions that decide the sign of the optimal setting λopt\lambda_{\rm opt}λopt​ for the ridge parameter λ\lambdaλ and confirm the implicit ℓ2\ell_2ℓ2​ regularization effect of overparameterization, which theoretically justifies the surprising empirical observation that λopt\lambda_{\rm opt}λopt​ can be negative in the overparameterized regime. We also characterize the double descent phenomenon for principal component regression (PCR) when both X\mathbf{X}X and β⋆\mathbf{\beta}_\starβ⋆​ are anisotropic. Finally, we determine the optimal weighting matrix Σw\mathbf{\Sigma}_wΣw​ for both the ridgeless (λ→0\lambda\to 0λ→0) and optimally regularized (λ=λopt\lambda = \lambda_{\rm opt}λ=λopt​) case, and demonstrate the advantage of the weighted objective over standard ridge regression and PCR.

View on arXiv
Comments on this paper