12
v1v2 (latest)

High-dimensional learning dynamics of multi-pass Stochastic Gradient Descent in multi-index models

Zhou Fan
Leda Wang
Main:60 Pages
6 Figures
Bibliography:4 Pages
Abstract

We study the learning dynamics of a multi-pass, mini-batch Stochastic Gradient Descent (SGD) procedure for empirical risk minimization in high-dimensional multi-index models with isotropic random data. In an asymptotic regime where the sample size nn and data dimension dd increase proportionally, for any sub-linear batch size κnα\kappa \asymp n^\alpha where α[0,1)\alpha \in [0,1), and for a commensurate ``critical'' scaling of the learning rate, we provide an asymptotically exact characterization of the coordinate-wise dynamics of SGD. This characterization takes the form of a system of dynamical mean-field equations, driven by a scalar Poisson jump process that represents the asymptotic limit of SGD sampling noise. We develop an analogous characterization of the Stochastic Modified Equation (SME) which provides a Gaussian diffusion approximation to SGD.Our analyses imply that the limiting dynamics for SGD are the same for any batch size scaling α[0,1)\alpha \in [0,1), and that under a commensurate scaling of the learning rate, dynamics of SGD, SME, and gradient flow are mutually distinct, with those of SGD and SME coinciding in the special case of a linear model. We recover a known dynamical mean-field characterization of gradient flow in a limit of small learning rate, and of one-pass/online SGD in a limit of increasing sample size n/dn/d \to \infty.

View on arXiv
Comments on this paper