13

High-dimensional learning dynamics of multi-pass Stochastic Gradient Descent in multi-index models

Zhou Fan
Leda Wang
Main:60 Pages
6 Figures
Bibliography:4 Pages
Abstract

We study the learning dynamics of a multi-pass, mini-batch Stochastic Gradient Descent (SGD) procedure for empirical risk minimization in high-dimensional multi-index models with isotropic random data. In an asymptotic regime where the sample size nn and data dimension dd increase proportionally, for any sub-linear batch size κnα\kappa \asymp n^\alpha where α[0,1)\alpha \in [0,1), and for a commensurate ``critical'' scaling of the learning rate, we provide an asymptotically exact characterization of the coordinate-wise dynamics of SGD. This characterization takes the form of a system of dynamical mean-field equations, driven by a scalar Poisson jump process that represents the asymptotic limit of SGD sampling noise. We develop an analogous characterization of the Stochastic Modified Equation (SME) which provides a Gaussian diffusion approximation to SGD.

View on arXiv
Comments on this paper