ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1803.05591
  4. Cited By
On the insufficiency of existing momentum schemes for Stochastic
  Optimization
v1v2 (latest)

On the insufficiency of existing momentum schemes for Stochastic Optimization

Information Theory and Applications Workshop (ITA), 2018
15 March 2018
Rahul Kidambi
Praneeth Netrapalli
Prateek Jain
Sham Kakade
    ODL
ArXiv (abs)PDFHTML

Papers citing "On the insufficiency of existing momentum schemes for Stochastic Optimization"

21 / 71 papers shown
Scheduled Restart Momentum for Accelerated Stochastic Gradient Descent
Scheduled Restart Momentum for Accelerated Stochastic Gradient DescentSIAM Journal of Imaging Sciences (SIIMS), 2020
Bao Wang
T. Nguyen
Andrea L. Bertozzi
Richard G. Baraniuk
Stanley J. Osher
ODL
194
54
0
24 Feb 2020
The Two Regimes of Deep Network Training
The Two Regimes of Deep Network Training
Guillaume Leclerc
Aleksander Madry
197
49
0
24 Feb 2020
Optimization for deep learning: theory and algorithms
Optimization for deep learning: theory and algorithms
Tian Ding
ODL
343
178
0
19 Dec 2019
Understanding the Role of Momentum in Stochastic Gradient Methods
Understanding the Role of Momentum in Stochastic Gradient MethodsNeural Information Processing Systems (NeurIPS), 2019
Igor Gitman
Hunter Lang
Pengchuan Zhang
Lin Xiao
159
106
0
30 Oct 2019
Demon: Improved Neural Network Training with Momentum Decay
Demon: Improved Neural Network Training with Momentum DecayIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2019
John Chen
Cameron R. Wolfe
Zhaoqi Li
Anastasios Kyrillidis
ODL
300
20
0
11 Oct 2019
Adaptive Weight Decay for Deep Neural Networks
Adaptive Weight Decay for Deep Neural NetworksIEEE Access (IEEE Access), 2019
Kensuke Nakamura
Byung-Woo Hong
178
50
0
21 Jul 2019
Which Algorithmic Choices Matter at Which Batch Sizes? Insights From a
  Noisy Quadratic Model
Which Algorithmic Choices Matter at Which Batch Sizes? Insights From a Noisy Quadratic ModelNeural Information Processing Systems (NeurIPS), 2019
Guodong Zhang
Lala Li
Zachary Nado
James Martens
Sushant Sachdeva
George E. Dahl
Christopher J. Shallue
Roger C. Grosse
421
176
0
09 Jul 2019
The Role of Memory in Stochastic Optimization
The Role of Memory in Stochastic OptimizationConference on Uncertainty in Artificial Intelligence (UAI), 2019
Antonio Orvieto
Jonas Köhler
Aurelien Lucchi
196
32
0
02 Jul 2019
An Adaptive Remote Stochastic Gradient Method for Training Neural
  Networks
An Adaptive Remote Stochastic Gradient Method for Training Neural Networks
Yushu Chen
Hao Jing
Wenlai Zhao
Zhiqiang Liu
Haohuan Fu
Lián Qiao
Wei Xue
Guangwen Yang
ODL
516
2
0
04 May 2019
The Step Decay Schedule: A Near Optimal, Geometrically Decaying Learning
  Rate Procedure For Least Squares
The Step Decay Schedule: A Near Optimal, Geometrically Decaying Learning Rate Procedure For Least SquaresNeural Information Processing Systems (NeurIPS), 2019
Rong Ge
Sham Kakade
Rahul Kidambi
Praneeth Netrapalli
341
174
0
29 Apr 2019
A Selective Overview of Deep Learning
A Selective Overview of Deep Learning
Jianqing Fan
Cong Ma
Yiqiao Zhong
BDLVLM
420
145
0
10 Apr 2019
On the Ineffectiveness of Variance Reduced Optimization for Deep
  Learning
On the Ineffectiveness of Variance Reduced Optimization for Deep Learning
Aaron Defazio
Léon Bottou
UQCVDRL
257
123
0
11 Dec 2018
Measuring the Effects of Data Parallelism on Neural Network Training
Measuring the Effects of Data Parallelism on Neural Network TrainingJournal of machine learning research (JMLR), 2018
Christopher J. Shallue
Jaehoon Lee
J. Antognini
J. Mamou
J. Ketterling
Yao Wang
563
452
0
08 Nov 2018
Accelerating SGD with momentum for over-parameterized learning
Accelerating SGD with momentum for over-parameterized learning
Chaoyue Liu
M. Belkin
ODL
319
19
0
31 Oct 2018
Quasi-hyperbolic momentum and Adam for deep learning
Quasi-hyperbolic momentum and Adam for deep learning
Jerry Ma
Denis Yarats
ODL
376
145
0
16 Oct 2018
Optimal Adaptive and Accelerated Stochastic Gradient Descent
Optimal Adaptive and Accelerated Stochastic Gradient Descent
Qi Deng
Yi Cheng
Guanghui Lan
139
8
0
01 Oct 2018
Convergence guarantees for RMSProp and ADAM in non-convex optimization
  and an empirical comparison to Nesterov acceleration
Convergence guarantees for RMSProp and ADAM in non-convex optimization and an empirical comparison to Nesterov acceleration
Soham De
Anirbit Mukherjee
Enayat Ullah
290
114
0
18 Jul 2018
Closing the Generalization Gap of Adaptive Gradient Methods in Training
  Deep Neural Networks
Closing the Generalization Gap of Adaptive Gradient Methods in Training Deep Neural Networks
Jinghui Chen
Dongruo Zhou
Yiqi Tang
Ziyan Yang
Yuan Cao
Quanquan Gu
ODL
368
208
0
18 Jun 2018
Interpreting Deep Learning: The Machine Learning Rorschach Test?
Interpreting Deep Learning: The Machine Learning Rorschach Test?
Adam S. Charles
AAMLHAIAI4CE
212
9
0
01 Jun 2018
Predictive Local Smoothness for Stochastic Gradient Methods
Predictive Local Smoothness for Stochastic Gradient Methods
Jun Yu Li
Hongfu Liu
Bineng Zhong
Yue Wu
Y. Fu
ODL
179
1
0
23 May 2018
Aggregated Momentum: Stability Through Passive Damping
Aggregated Momentum: Stability Through Passive Damping
James Lucas
Shengyang Sun
R. Zemel
Roger C. Grosse
377
73
0
01 Apr 2018
Previous
12
Page 2 of 2