v1v2 (latest)

A Markov Chain Theory Approach to Characterizing the Minimax Optimality of Stochastic Gradient Descent (for Least Squares)

Foundations of Software Technology and Theoretical Computer Science (FSTTCS), 2017

25 October 2017

Papers citing "A Markov Chain Theory Approach to Characterizing the Minimax Optimality of Stochastic Gradient Descent (for Least Squares)"

29 / 29 papers shown

Title
Seesaw: Accelerating Training by Balancing Learning Rate and Batch Size Scheduling Alexandru Meterez Depen Morwani Jingfeng Wu Costin-Andrei Oncescu Cengiz Pehlevan Sham Kakade LRM 28 0 0 16 Oct 2025
On the Interplay between Graph Structure and Learning Algorithms in Graph Neural Networks Junwei Su Chuan Wu 44 0 0 20 Aug 2025
Improved Scaling Laws in Linear Regression via Data Reuse Licong Lin Jingfeng Wu Peter Bartlett 98 0 0 10 Jun 2025
The Optimality of (Accelerated) SGD for High-Dimensional Quadratic Optimization Haihan Zhang Yuanshi Liu Qianwen Chen Cong Fang 140 1 0 15 Sep 2024
Scaling Laws in Linear Regression: Compute, Parameters, and Data Licong Lin Jingfeng Wu Sham Kakade Peter L. Bartlett Jason D. Lee LRM 265 30 0 12 Jun 2024
Understanding Forgetting in Continual Learning with Linear Regression Meng Ding Kaiyi Ji Haiyan Zhao Jinhui Xu CLL 199 13 0 27 May 2024
Improving Implicit Regularization of SGD with Preconditioning for Least Square Problems Junwei Su Difan Zou Chuan Wu 248 0 0 13 Mar 2024
How Many Pretraining Tasks Are Needed for In-Context Learning of Linear Regression?International Conference on Learning Representations (ICLR), 2023 Jingfeng Wu Difan Zou Zixiang Chen Vladimir Braverman Quanquan Gu Peter L. Bartlett 293 82 0 12 Oct 2023
Correlated Noise Provably Beats Independent Noise for Differentially Private LearningInternational Conference on Learning Representations (ICLR), 2023 Christopher A. Choquette-Choo Krishnamurthy Dvijotham Krishna Pillutla Arun Ganesh Thomas Steinke Abhradeep Thakurta 149 20 0 10 Oct 2023
Convergence and concentration properties of constant step-size SGD through Markov chains Ibrahim Merad Stéphane Gaïffas 139 6 0 20 Jun 2023
SGD learning on neural networks: leap complexity and saddle-to-saddle dynamicsAnnual Conference Computational Learning Theory (COLT), 2023 Emmanuel Abbe Enric Boix-Adserà Theodor Misiakiewicz FedML MLT 239 104 0 21 Feb 2023
Statistical and Computational Guarantees for Influence Diagnostics Jillian R. Fisher Lang Liu Krishna Pillutla Y. Choi Zaïd Harchaoui TDI 146 0 0 08 Dec 2022
Local SGD in Overparameterized Linear Regression Mike Nguyen Charly Kirst Nicole Mücke 107 0 0 20 Oct 2022
The Power and Limitation of Pretraining-Finetuning for Linear Regression under Covariate ShiftNeural Information Processing Systems (NeurIPS), 2022 Jingfeng Wu Difan Zou Vladimir Braverman Quanquan Gu Sham Kakade 109 21 0 03 Aug 2022
(Nearly) Optimal Private Linear Regression via Adaptive Clipping Prateeksha Varshney Abhradeep Thakurta Prateek Jain 132 9 0 11 Jul 2022
Provable Generalization of Overparameterized Meta-learning Trained with SGDNeural Information Processing Systems (NeurIPS), 2022 Yu Huang Yingbin Liang Longbo Huang MLT 168 11 0 18 Jun 2022
Risk Bounds of Multi-Pass SGD for Least Squares in the Interpolation RegimeNeural Information Processing Systems (NeurIPS), 2022 Difan Zou Jingfeng Wu Vladimir Braverman Quanquan Gu Sham Kakade 106 8 0 07 Mar 2022
On the Double Descent of Random Features Models Trained with SGD Fanghui Liu Johan A. K. Suykens Volkan Cevher MLT 307 11 0 13 Oct 2021
Last Iterate Risk Bounds of SGD with Decaying Stepsize for Overparameterized Linear RegressionInternational Conference on Machine Learning (ICML), 2021 Jingfeng Wu Difan Zou Vladimir Braverman Quanquan Gu Sham Kakade 288 28 0 12 Oct 2021
The Benefits of Implicit Regularization from SGD in Least Squares ProblemsNeural Information Processing Systems (NeurIPS), 2021 Difan Zou Jingfeng Wu Vladimir Braverman Quanquan Gu Dean Phillips Foster Sham Kakade 111 34 0 10 Aug 2021
Benign Overfitting of Constant-Stepsize SGD for Linear RegressionAnnual Conference Computational Learning Theory (COLT), 2021 Difan Zou Jingfeng Wu Vladimir Braverman Quanquan Gu Sham Kakade 132 70 0 23 Mar 2021
On the Regularization Effect of Stochastic Gradient Descent applied to Least Squares Stefan Steinerberger 95 1 0 27 Jul 2020
The Implicit Regularization of Stochastic Gradient Flow for Least SquaresInternational Conference on Machine Learning (ICML), 2020 Alnur Ali Guang Cheng Robert Tibshirani 133 78 0 17 Mar 2020
Robust Aggregation for Federated LearningIEEE Transactions on Signal Processing (IEEE Trans. Signal Process.), 2019 Krishna Pillutla Sham Kakade Zaïd Harchaoui FedML 236 779 0 31 Dec 2019
On the Theory of Policy Gradient Methods: Optimality, Approximation, and Distribution ShiftAnnual Conference Computational Learning Theory (COLT), 2019 Alekh Agarwal Sham Kakade Jason D. Lee G. Mahajan 306 330 0 01 Aug 2019
The Step Decay Schedule: A Near Optimal, Geometrically Decaying Learning Rate Procedure For Least SquaresNeural Information Processing Systems (NeurIPS), 2019 Rong Ge Sham Kakade Rahul Kidambi Praneeth Netrapalli 226 168 0 29 Apr 2019
Uniform-in-Time Weak Error Analysis for Stochastic Gradient Descent Algorithms via Diffusion ApproximationCommunications in Mathematical Sciences (Comm. Math. Sci.), 2019 Yuanyuan Feng Tingran Gao Lei Li Jian‐Guo Liu Yulong Lu 135 25 0 02 Feb 2019
Iterate averaging as regularization for stochastic gradient descent Gergely Neu Lorenzo Rosasco MoMe 162 61 0 22 Feb 2018
HiGrad: Uncertainty Quantification for Online Learning and Stochastic Approximation Weijie J. Su Yuancheng Zhu 180 9 0 13 Feb 2018