Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
All Papers
0 / 0 papers shown
Title
Home
Papers
1803.05591
Cited By
v1
v2 (latest)
On the insufficiency of existing momentum schemes for Stochastic Optimization
Information Theory and Applications Workshop (ITA), 2018
15 March 2018
Rahul Kidambi
Praneeth Netrapalli
Prateek Jain
Sham Kakade
ODL
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"On the insufficiency of existing momentum schemes for Stochastic Optimization"
50 / 71 papers shown
Title
Accelerating SGDM via Learning Rate and Batch Size Schedules: A Lyapunov-Based Analysis
Yuichi Kondo
Hideaki Iiduka
68
0
0
05 Aug 2025
Small Batch Size Training for Language Models: When Vanilla SGD Works, and Why Gradient Accumulation Is Wasteful
Martin Marek
Sanae Lotfi
Aditya Somasundaram
A. Wilson
Micah Goldblum
LRM
328
11
0
09 Jul 2025
Increasing Batch Size Improves Convergence of Stochastic Gradient Descent with Momentum
Keisuke Kamo
Hideaki Iiduka
306
2
0
15 Jan 2025
On the Performance Analysis of Momentum Method: A Frequency Domain Perspective
International Conference on Learning Representations (ICLR), 2024
Xianliang Li
Jun Luo
Zhiwei Zheng
Hanxiao Wang
Li Luo
Lingkun Wen
Linlong Wu
Sheng Xu
449
4
0
29 Nov 2024
The AdEMAMix Optimizer: Better, Faster, Older
International Conference on Learning Representations (ICLR), 2024
Matteo Pagliardini
Pierre Ablin
David Grangier
ODL
292
22
0
05 Sep 2024
Accelerated Stochastic Min-Max Optimization Based on Bias-corrected Momentum
H. Cai
Sulaiman A. Alghunaim
Ali H.Sayed
336
1
0
18 Jun 2024
Stochastic Polyak Step-sizes and Momentum: Convergence Guarantees and Practical Performance
Dimitris Oikonomou
Nicolas Loizou
270
11
0
06 Jun 2024
Does SGD really happen in tiny subspaces?
Minhak Song
Kwangjun Ahn
Chulhee Yun
454
14
1
25 May 2024
Leveraging Continuous Time to Understand Momentum When Training Diagonal Linear Networks
International Conference on Artificial Intelligence and Statistics (AISTATS), 2024
Hristo Papazov
Scott Pesme
Nicolas Flammarion
206
8
0
08 Mar 2024
Momentum Does Not Reduce Stochastic Noise in Stochastic Gradient Descent
Naoki Sato
Hideaki Iiduka
ODL
389
1
0
04 Feb 2024
Randomized Kaczmarz with geometrically smoothed momentum
SIAM Journal on Matrix Analysis and Applications (SIMAX), 2024
Seth J. Alderman
Roan W. Luikart
Nicholas F. Marshall
170
5
0
17 Jan 2024
(Accelerated) Noise-adaptive Stochastic Heavy-Ball Momentum
Anh Dang
Reza Babanezhad
Sharan Vaswani
221
0
0
12 Jan 2024
Accelerated Convergence of Stochastic Heavy Ball Method under Anisotropic Gradient Noise
Boyao Wang
Yuxing Liu
Xiaoyu Wang
Tong Zhang
154
6
0
22 Dec 2023
On the Role of Server Momentum in Federated Learning
Jianhui Sun
Xidong Wu
Heng-Chiao Huang
Aidong Zhang
FedML
214
18
0
19 Dec 2023
Gradient Descent with Polyak's Momentum Finds Flatter Minima via Large Catapults
Prin Phunyaphibarn
Junghyun Lee
Bohan Wang
Huishuai Zhang
Chulhee Yun
268
1
0
25 Nov 2023
From Optimization to Control: Quasi Policy Iteration
Mohammad Amin Sharifi Kolarijani
Peyman Mohajerin Esfahani
197
3
0
18 Nov 2023
Handling Data Heterogeneity via Architectural Design for Federated Visual Recognition
Neural Information Processing Systems (NeurIPS), 2023
Sara Pieri
Jose Renato Restom
Samuel Horvath
Hisham Cholakkal
FedML
137
9
0
23 Oct 2023
The Marginal Value of Momentum for Small Learning Rate SGD
International Conference on Learning Representations (ICLR), 2023
Runzhe Wang
Sadhika Malladi
Tianhao Wang
Kaifeng Lyu
Zhiyuan Li
ODL
196
10
0
27 Jul 2023
When and Why Momentum Accelerates SGD:An Empirical Study
Jingwen Fu
Bohan Wang
Huishuai Zhang
Zhizheng Zhang
Wei Chen
Na Zheng
235
15
0
15 Jun 2023
Stochastic Gradient Langevin Dynamics Based on Quantization with Increasing Resolution
Jinwuk Seok
Chang-Jae Cho
226
0
0
30 May 2023
Acceleration of stochastic gradient descent with momentum by averaging: finite-sample rates and asymptotic normality
Kejie Tang
Weidong Liu
Yichen Zhang
Xi Chen
160
3
0
28 May 2023
First Order Methods with Markovian Noise: from Acceleration to Variational Inequalities
Neural Information Processing Systems (NeurIPS), 2023
Aleksandr Beznosikov
S. Samsonov
Marina Sheshukova
Alexander Gasnikov
A. Naumov
Eric Moulines
230
17
0
25 May 2023
Flatter, faster: scaling momentum for optimal speedup of SGD
Aditya Cowsik
T. Can
Paolo Glorioso
297
6
0
28 Oct 2022
Towards understanding how momentum improves generalization in deep learning
International Conference on Machine Learning (ICML), 2022
Samy Jelassi
Yuanzhi Li
ODL
MLT
AI4CE
183
44
0
13 Jul 2022
A view of mini-batch SGD via generating functions: conditions of convergence, phase transitions, benefit from negative momenta
International Conference on Learning Representations (ICLR), 2022
Maksim Velikanov
Denis Kuznedelev
Dmitry Yarotsky
145
10
0
22 Jun 2022
On the fast convergence of minibatch heavy ball momentum
IMA Journal of Numerical Analysis (IMA J. Numer. Anal.), 2022
Raghu Bollapragada
Tyler Chen
Rachel A. Ward
325
20
0
15 Jun 2022
Trajectory of Mini-Batch Momentum: Batch Size Saturation and Convergence in High Dimensions
Neural Information Processing Systems (NeurIPS), 2022
Kiwon Lee
Andrew N. Cheng
Courtney Paquette
Elliot Paquette
145
16
0
02 Jun 2022
Feedback Gradient Descent: Efficient and Stable Optimization with Orthogonality for DNNs
AAAI Conference on Artificial Intelligence (AAAI), 2022
Fanchen Bu
D. Chang
152
7
0
12 May 2022
Temporal Efficient Training of Spiking Neural Network via Gradient Re-weighting
International Conference on Learning Representations (ICLR), 2022
Shi-Wee Deng
Yuhang Li
Shanghang Zhang
Shi Gu
417
316
0
24 Feb 2022
Policy Learning and Evaluation with Randomized Quasi-Monte Carlo
International Conference on Artificial Intelligence and Statistics (AISTATS), 2022
Sébastien M. R. Arnold
P. LÉcuyer
Liyu Chen
Yi-fan Chen
Fei Sha
OffRL
152
4
0
16 Feb 2022
Convergence and Stability of the Stochastic Proximal Point Algorithm with Momentum
Conference on Learning for Dynamics & Control (L4DC), 2021
Junhyung Lyle Kim
Panos Toulis
Anastasios Kyrillidis
450
10
0
11 Nov 2021
An Asymptotic Analysis of Minibatch-Based Momentum Methods for Linear Regression Models
Journal of Computational And Graphical Statistics (JCGS), 2021
Yuan Gao
Xuening Zhu
Haobo Qi
Guodong Li
Riquan Zhang
Hansheng Wang
212
3
0
02 Nov 2021
Does Momentum Help? A Sample Complexity Analysis
Swetha Ganesh
Rohan Deb
Gugan Thoppe
A. Budhiraja
170
2
0
29 Oct 2021
Scaling transition from momentum stochastic gradient descent to plain stochastic gradient descent
Kun Zeng
Jinlan Liu
Zhixia Jiang
Dongpo Xu
109
1
0
12 Jun 2021
Dynamics of Stochastic Momentum Methods on Large-scale, Quadratic Models
Neural Information Processing Systems (NeurIPS), 2021
Courtney Paquette
Elliot Paquette
ODL
157
16
0
07 Jun 2021
Escaping Saddle Points Faster with Stochastic Momentum
International Conference on Learning Representations (ICLR), 2020
Jun-Kun Wang
Chi-Heng Lin
Jacob D. Abernethy
ODL
155
24
0
05 Jun 2021
Training With Data Dependent Dynamic Learning Rates
Shreyas Saxena
Nidhi Vyas
D. DeCoste
ODL
79
1
0
27 May 2021
Analytical Study of Momentum-Based Acceleration Methods in Paradigmatic High-Dimensional Non-Convex Problems
Neural Information Processing Systems (NeurIPS), 2021
Stefano Sarao Mannelli
Pierfrancesco Urbani
247
11
0
23 Feb 2021
On the Last Iterate Convergence of Momentum Methods
International Conference on Algorithmic Learning Theory (ALT), 2021
Xiaoyun Li
Mingrui Liu
Francesco Orabona
258
12
0
13 Feb 2021
Regularization in network optimization via trimmed stochastic gradient descent with noisy label
IEEE Access (IEEE Access), 2020
Kensuke Nakamura
Bong-Soo Sohn
Kyoung-Jae Won
Byung-Woo Hong
NoLa
156
2
0
21 Dec 2020
Recent Theoretical Advances in Non-Convex Optimization
Marina Danilova
Pavel Dvurechensky
Alexander Gasnikov
Eduard A. Gorbunov
Sergey Guminov
Dmitry Kamzolov
Innokentiy Shibaev
298
102
0
11 Dec 2020
A Modular Analysis of Provable Acceleration via Polyak's Momentum: Training a Wide ReLU Network and a Deep Linear Network
International Conference on Machine Learning (ICML), 2020
Jun-Kun Wang
Chi-Heng Lin
Jacob D. Abernethy
509
24
0
04 Oct 2020
Quickly Finding a Benign Region via Heavy Ball Momentum in Non-Convex Optimization
Jun-Kun Wang
Jacob D. Abernethy
279
8
0
04 Oct 2020
Momentum via Primal Averaging: Theoretical Insights and Learning Rate Schedules for Non-Convex Optimization
Aaron Defazio
217
28
0
01 Oct 2020
On the Generalization Benefit of Noise in Stochastic Gradient Descent
Samuel L. Smith
Erich Elsen
Soham De
MLT
153
114
0
26 Jun 2020
Almost sure convergence rates for Stochastic Gradient Descent and Stochastic Heavy Ball
Othmane Sebbouh
Robert Mansel Gower
Aaron Defazio
127
23
0
14 Jun 2020
Stochastic Optimization with Heavy-Tailed Noise via Accelerated Gradient Clipping
Eduard A. Gorbunov
Marina Danilova
Alexander Gasnikov
195
141
0
21 May 2020
Stochastic batch size for adaptive regularization in deep network optimization
Pattern Recognition (Pattern Recognit.), 2020
Kensuke Nakamura
Stefano Soatto
Byung-Woo Hong
ODL
143
7
0
14 Apr 2020
On the Convergence of Nesterov's Accelerated Gradient Method in Stochastic Settings
International Conference on Machine Learning (ICML), 2020
Mahmoud Assran
Michael G. Rabbat
197
69
0
27 Feb 2020
Statistical Adaptive Stochastic Gradient Methods
Pengchuan Zhang
Hunter Lang
Qiang Liu
Lin Xiao
ODL
167
12
0
25 Feb 2020
1
2
Next