Stochastic gradient descent (SGD) has emerged as the quintessential method in a data scientist's toolbox. Using SGD for high-stakes applications requires, however, careful quantification of the associated uncertainty. Towards that end, in this work, we establish a high-dimensional Central Limit Theorem (CLT) for linear functionals of online SGD iterates for overparametrized least-squares regression with non-isotropic Gaussian inputs. We first show that a bias-corrected CLT holds when the number of iterations of the online SGD, , grows sub-linearly in the dimensionality, . In order to use the developed result in practice, we further develop an online approach for estimating the variance term appearing in the CLT, and establish high-probability bounds for the developed online estimator. Together with the CLT result, this provides a fully online and data-driven way to numerically construct confidence intervals. This enables practical high-dimensional algorithmic inference with SGD and to the best of our knowledge, is the first such result.
View on arXiv@article{agrawalla2025_2302.09727, title={ Statistical Inference for Linear Functionals of Online SGD in High-dimensional Linear Regression }, author={ Bhavya Agrawalla and Krishnakumar Balasubramanian and Promit Ghosal }, journal={arXiv preprint arXiv:2302.09727}, year={ 2025 } }