ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2007.13288
9
1

On the Regularization Effect of Stochastic Gradient Descent applied to Least Squares

27 July 2020
Stefan Steinerberger
ArXivPDFHTML
Abstract

We study the behavior of stochastic gradient descent applied to ∥Ax−b∥22→min⁡\|Ax -b \|_2^2 \rightarrow \min∥Ax−b∥22​→min for invertible A∈Rn×nA \in \mathbb{R}^{n \times n}A∈Rn×n. We show that there is an explicit constant cAc_{A}cA​ depending (mildly) on AAA such that \mathbb{E} ~\left\| Ax_{k+1}-b\right\|^2_{2} \leq \left(1 + \frac{c_{A}}{\|A\|_F^2}\right) \left\|A x_k -b \right\|^2_{2} - \frac{2}{\|A\|_F^2} \left\|A^T A (x_k - x)\right\|^2_{2}. This is a curious inequality: the last term has one more matrix applied to the residual uk−uu_k - uuk​−u than the remaining terms: if xk−xx_k - xxk​−x is mainly comprised of large singular vectors, stochastic gradient descent leads to a quick regularization. For symmetric matrices, this inequality has an extension to higher-order Sobolev spaces. This explains a (known) regularization phenomenon: an energy cascade from large singular values to small singular values smoothes.

View on arXiv
Comments on this paper