53
1

Statistical Inference with Stochastic Gradient Algorithms

Abstract

Tuning of stochastic gradient algorithms (SGAs) for optimization and sampling is often based on heuristics and trial-and-error rather than generalizable theory. We address this theory--practice gap by characterizing the statistical asymptotics of SGAs via a joint step-size--sample-size scaling limit. We show that iterate averaging with a large fixed step size is robust to the choice of tuning parameters and asymptotically has covariance proportional to that of the MLE sampling distribution. We also prove a Bernstein--von Mises-like theorem to guide tuning, including for generalized posteriors that are robust to model misspecification. Numerical experiments validate our results in realistic finite-sample regimes.

View on arXiv
Comments on this paper