v1v2 (latest)
On the Regularization Effect of Stochastic Gradient Descent applied to
Least Squares
Abstract
We study the behavior of stochastic gradient descent applied to for invertible . We show that there is an explicit constant depending (mildly) on such that This is a curious inequality: the last term has one more matrix applied to the residual than the remaining terms: if is mainly comprised of large singular vectors, stochastic gradient descent leads to a quick regularization. For symmetric matrices, this inequality has an extension to higher-order Sobolev spaces. This explains a (known) regularization phenomenon: an energy cascade from large singular values to small singular values smoothes.
View on arXivComments on this paper
