v1v2 (latest)

On the insufficiency of existing momentum schemes for Stochastic Optimization

Information Theory and Applications Workshop (ITA), 2018

15 March 2018

Papers citing "On the insufficiency of existing momentum schemes for Stochastic Optimization"

21 / 71 papers shown

Scheduled Restart Momentum for Accelerated Stochastic Gradient DescentSIAM Journal of Imaging Sciences (SIIMS), 2020

Bao Wang

T. Nguyen

Andrea L. Bertozzi

Richard G. Baraniuk

Stanley J. Osher

ODL

194

24 Feb 2020

The Two Regimes of Deep Network Training

Guillaume Leclerc

Aleksander Madry

197

24 Feb 2020

Optimization for deep learning: theory and algorithms

Tian Ding

ODL

343

178

19 Dec 2019

Understanding the Role of Momentum in Stochastic Gradient MethodsNeural Information Processing Systems (NeurIPS), 2019

159

106

30 Oct 2019

Demon: Improved Neural Network Training with Momentum DecayIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2019

John Chen

Cameron R. Wolfe

Zhaoqi Li

Anastasios Kyrillidis

ODL

300

11 Oct 2019

Adaptive Weight Decay for Deep Neural NetworksIEEE Access (IEEE Access), 2019

Kensuke Nakamura

Byung-Woo Hong

178

21 Jul 2019

Which Algorithmic Choices Matter at Which Batch Sizes? Insights From a Noisy Quadratic ModelNeural Information Processing Systems (NeurIPS), 2019

Christopher J. Shallue

Roger C. Grosse

421

176

09 Jul 2019

The Role of Memory in Stochastic OptimizationConference on Uncertainty in Artificial Intelligence (UAI), 2019

Antonio Orvieto

Jonas Köhler

Aurelien Lucchi

196

02 Jul 2019

An Adaptive Remote Stochastic Gradient Method for Training Neural Networks

516

04 May 2019

The Step Decay Schedule: A Near Optimal, Geometrically Decaying Learning Rate Procedure For Least SquaresNeural Information Processing Systems (NeurIPS), 2019

341

174

29 Apr 2019

A Selective Overview of Deep Learning

420

145

10 Apr 2019

On the Ineffectiveness of Variance Reduced Optimization for Deep Learning

Aaron Defazio

Léon Bottou

UQCV DRL

257

123

11 Dec 2018

Measuring the Effects of Data Parallelism on Neural Network TrainingJournal of machine learning research (JMLR), 2018

Christopher J. Shallue

563

452

08 Nov 2018

Accelerating SGD with momentum for over-parameterized learning

Chaoyue Liu

M. Belkin

ODL

319

31 Oct 2018

Quasi-hyperbolic momentum and Adam for deep learning

Jerry Ma

Denis Yarats

ODL

376

145

16 Oct 2018

Optimal Adaptive and Accelerated Stochastic Gradient Descent

Qi Deng

Yi Cheng

Guanghui Lan

139

01 Oct 2018

Convergence guarantees for RMSProp and ADAM in non-convex optimization and an empirical comparison to Nesterov acceleration

Soham De

Anirbit Mukherjee

Enayat Ullah

290

114

18 Jul 2018

Closing the Generalization Gap of Adaptive Gradient Methods in Training Deep Neural Networks

Quanquan Gu

368

208

18 Jun 2018

Interpreting Deep Learning: The Machine Learning Rorschach Test?

Adam S. Charles

AAML HAI AI4CE

212

01 Jun 2018

Predictive Local Smoothness for Stochastic Gradient Methods

179

23 May 2018

Aggregated Momentum: Stability Through Passive Damping

377

01 Apr 2018