v1v2v3v4 (latest)

Parallelizing Stochastic Gradient Descent for Least Squares Regression: mini-batching, averaging, and model misspecification

12 October 2016

Papers citing "Parallelizing Stochastic Gradient Descent for Least Squares Regression: mini-batching, averaging, and model misspecification"

21 / 21 papers shown

Title
Accelerated Convergence of Stochastic Heavy Ball Method under Anisotropic Gradient Noise Boyao Wang Yuxing Liu Xiaoyu Wang Tong Zhang 103 6 0 22 Dec 2023
Eigencurve: Optimal Learning Rate Schedule for SGD on Quadratic Objectives with Skewed Hessian Spectrums Boyao Wang Haishan Ye Tong Zhang 218 17 0 27 Oct 2021
Learning Under Delayed Feedback: Implicitly Adapting to Gradient Delays R. Aviv Ido Hakimi Assaf Schuster Kfir Y. Levy 148 9 0 23 Jun 2021
On the Theory of Policy Gradient Methods: Optimality, Approximation, and Distribution ShiftAnnual Conference Computational Learning Theory (COLT), 2019 Alekh Agarwal Sham Kakade Jason D. Lee G. Mahajan 306 330 0 01 Aug 2019
The Step Decay Schedule: A Near Optimal, Geometrically Decaying Learning Rate Procedure For Least SquaresNeural Information Processing Systems (NeurIPS), 2019 Rong Ge Sham Kakade Rahul Kidambi Praneeth Netrapalli 226 168 0 29 Apr 2019
Communication trade-offs for synchronized distributed SGD with large step size Kumar Kshitij Patel Hadrien Hendrikx FedML 115 27 0 25 Apr 2019
Anytime Tail Averaging Nicolas Le Roux MoMe 87 5 0 13 Feb 2019
The Effect of Network Width on the Performance of Large-batch Training Lingjiao Chen Hongyi Wang Jinman Zhao Dimitris Papailiopoulos Paraschos Koutris 136 22 0 11 Jun 2018
On the insufficiency of existing momentum schemes for Stochastic OptimizationInformation Theory and Applications Workshop (ITA), 2018 Rahul Kidambi Praneeth Netrapalli Prateek Jain Sham Kakade ODL 180 127 0 15 Mar 2018
Iterate averaging as regularization for stochastic gradient descent Gergely Neu Lorenzo Rosasco MoMe 162 61 0 22 Feb 2018
HiGrad: Uncertainty Quantification for Online Learning and Stochastic Approximation Weijie J. Su Yuancheng Zhu 172 9 0 13 Feb 2018
Optimal Convergence for Distributed Learning with Stochastic Gradient Methods and Spectral Algorithms Junhong Lin Volkan Cevher 136 34 0 22 Jan 2018
The Power of Interpolation: Understanding the Effectiveness of SGD in Modern Over-parametrized Learning Siyuan Ma Raef Bassily M. Belkin 191 306 0 18 Dec 2017
Exponential convergence of testing error for stochastic gradient methods Loucas Pillaud-Vivien Alessandro Rudi Francis R. Bach 186 32 0 13 Dec 2017
AdaBatch: Efficient Gradient Aggregation Rules for Sequential and Parallel Stochastic Gradient Methods Alexandre Défossez Francis R. Bach ODL 86 17 0 06 Nov 2017
A Markov Chain Theory Approach to Characterizing the Minimax Optimality of Stochastic Gradient Descent (for Least Squares)Foundations of Software Technology and Theoretical Computer Science (FSTTCS), 2017 Prateek Jain Sham Kakade Rahul Kidambi Praneeth Netrapalli Krishna Pillutla Aaron Sidford 135 39 0 25 Oct 2017
Bridging the Gap between Constant Step Size Stochastic Gradient Descent and Markov Chains Hadrien Hendrikx Alain Durmus Francis R. Bach 204 162 0 20 Jul 2017
Gradient Diversity: a Key Ingredient for Scalable Distributed Learning Dong Yin A. Pananjady Max Lam Dimitris Papailiopoulos Kannan Ramchandran Peter L. Bartlett 119 12 0 18 Jun 2017
Online to Offline Conversions, Universality and Adaptive Minibatch SizesNeural Information Processing Systems (NeurIPS), 2017 Kfir Y. Levy ODL 179 64 0 30 May 2017
Accelerating Stochastic Gradient Descent For Least Squares Regression Prateek Jain Sham Kakade Rahul Kidambi Praneeth Netrapalli Aaron Sidford 200 84 0 26 Apr 2017
Stochastic Composite Least-Squares Regression with convergence rate O(1/n)Annual Conference Computational Learning Theory (COLT), 2017 Nicolas Flammarion Francis R. Bach 120 28 0 21 Feb 2017