A Continuous-Time View of Early Stopping for Least Squares Regression
We study the statistical properties of the iterates generated by gradient descent, applied to the fundamental problem of least squares regression. We take a continuous-time view, i.e., consider infinitesimal step sizes in gradient descent, in which case the iterates form a trajectory called gradient flow. In a random matrix theory setup, which allows the number of samples and features to diverge in such a way that , we derive and analyze an asymptotic risk expression for gradient flow. In particular, we compare the asymptotic risk profile of gradient flow to that of ridge regression. When the feature covariance is spherical, we show that the optimal asymptotic gradient flow risk is between 1 and 1.22 times the optimal asymptotic ridge risk. Furthermore, under an explicit calibration between the two risk curves, we prove that the asymptotic gradient flow risk is no more than 1.69 times the asymptotic ridge risk, at all points along the path. We present a number of other results illustrating the connections between gradient flow and regularization, and supporting numerical experiments.
View on arXiv