Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2002.11803
Cited By
Disentangling Adaptive Gradient Methods from Learning Rates
26 February 2020
Naman Agarwal
Rohan Anil
Elad Hazan
Tomer Koren
Cyril Zhang
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Disentangling Adaptive Gradient Methods from Learning Rates"
34 / 34 papers shown
Understanding and Improving Shampoo and SOAP via Kullback-Leibler Minimization
Wu Lin
Scott C. Lowe
Felix Dangel
Runa Eschenhagen
Zikun Xu
Roger B. Grosse
407
0
0
03 Sep 2025
Is your batch size the problem? Revisiting the Adam-SGD gap in language modeling
Teodora Srećković
Jonas Geiping
Antonio Orvieto
MoE
229
8
0
14 Jun 2025
A Stable Whitening Optimizer for Efficient Neural Network Training
Kevin Frans
Sergey Levine
Pieter Abbeel
513
8
0
08 Jun 2025
Purifying Shampoo: Investigating Shampoo's Heuristics by Decomposing its Preconditioner
Runa Eschenhagen
Aaron Defazio
Tsung-Hsien Lee
Richard Turner
Hao-Jun Michael Shi
349
13
0
04 Jun 2025
Gradient Methods with Online Scaling Part I. Theoretical Foundations
Wenzhi Gao
Ya-Chi Chu
Yinyu Ye
Madeleine Udell
362
4
0
29 May 2025
Striving for Simplicity: Simple Yet Effective Prior-Aware Pseudo-Labeling for Semi-Supervised Ultrasound Image Segmentation
International Conference on Medical Image Computing and Computer-Assisted Intervention (MICCAI), 2025
Yaxiong Chen
Yujie Wang
Zixuan Zheng
Jingliang Hu
Yilei Shi
Shengwu Xiong
Xiao Xiang Zhu
Lichao Mou
505
3
0
18 Mar 2025
Deconstructing What Makes a Good Optimizer for Language Models
Rosie Zhao
Depen Morwani
David Brandfonbrener
Nikhil Vyas
Sham Kakade
525
42
0
10 Jul 2024
4-bit Shampoo for Memory-Efficient Network Training
Sike Wang
Jia Li
Pan Zhou
Hua Huang
MQ
553
15
0
28 May 2024
Maxwell's Demon at Work: Efficient Pruning by Leveraging Saturation of Neurons
Simon Dufort-Labbé
P. DÓro
Evgenii Nikishin
Razvan Pascanu
Pierre-Luc Bacon
A. Baratin
458
5
0
12 Mar 2024
Jorge: Approximate Preconditioning for GPU-efficient Second-order Optimization
Siddharth Singh
Zack Sating
A. Bhatele
ODL
230
1
0
18 Oct 2023
Butterfly Effects of SGD Noise: Error Amplification in Behavior Cloning and Autoregression
International Conference on Learning Representations (ICLR), 2023
Adam Block
Dylan J. Foster
Akshay Krishnamurthy
Max Simchowitz
Cyril Zhang
319
11
0
17 Oct 2023
Understanding the robustness difference between stochastic gradient descent and adaptive gradient methods
A. Ma
Yangchen Pan
Amir-massoud Farahmand
AAML
322
9
0
13 Aug 2023
Searching for Optimal Per-Coordinate Step-sizes with Multidimensional Backtracking
Neural Information Processing Systems (NeurIPS), 2023
Frederik Kunstner
V. S. Portella
Mark Schmidt
Nick Harvey
354
11
0
05 Jun 2023
Mechanic: A Learning Rate Tuner
Neural Information Processing Systems (NeurIPS), 2023
Ashok Cutkosky
Aaron Defazio
Harsh Mehta
OffRL
551
22
0
31 May 2023
Sketchy: Memory-efficient Adaptive Regularization with Frequent Directions
Neural Information Processing Systems (NeurIPS), 2023
Vladimir Feinberg
Xinyi Chen
Y. Jennifer Sun
Rohan Anil
Elad Hazan
284
20
0
07 Feb 2023
Disentangling the Mechanisms Behind Implicit Regularization in SGD
International Conference on Learning Representations (ICLR), 2022
Cheng-i Wang
Simran Kaur
Tanya Marwah
Saurabh Garg
Zachary Chase Lipton
FedML
340
2
0
29 Nov 2022
VeLO: Training Versatile Learned Optimizers by Scaling Up
Luke Metz
James Harrison
C. Freeman
Amil Merchant
Lucas Beyer
...
Naman Agrawal
Ben Poole
Igor Mordatch
Adam Roberts
Jascha Narain Sohl-Dickstein
362
78
0
17 Nov 2022
FIT: A Metric for Model Sensitivity
International Conference on Learning Representations (ICLR), 2022
Ben Zandonati
Adrian Alan Pol
M. Pierini
Olya Sirkin
Tal Kopetz
MQ
370
11
0
16 Oct 2022
A Scalable Finite Difference Method for Deep Reinforcement Learning
Matthew Allen
John C. Raisbeck
Hakho Lee
219
0
0
14 Oct 2022
Dissecting adaptive methods in GANs
Samy Jelassi
David Dobre
A. Mensch
Yuanzhi Li
Gauthier Gidel
246
5
0
09 Oct 2022
On the Factory Floor: ML Engineering for Industrial-Scale Ads Recommendation Models
Rohan Anil
S. Gadanho
Danya Huang
Nijith Jacob
Zhuoshu Li
...
Cristina Pop
Kevin Regan
G. Shamir
Rakesh Shivanna
Qiqi Yan
3DV
345
51
0
12 Sep 2022
Adaptive Gradient Methods at the Edge of Stability
Jeremy M. Cohen
Behrooz Ghorbani
Shankar Krishnan
Naman Agarwal
Sourabh Medapati
...
Daniel Suo
David E. Cardoze
Zachary Nado
George E. Dahl
Justin Gilmer
ODL
311
74
0
29 Jul 2022
Hidden Progress in Deep Learning: SGD Learns Parities Near the Computational Limit
Neural Information Processing Systems (NeurIPS), 2022
Boaz Barak
Benjamin L. Edelman
Surbhi Goel
Sham Kakade
Eran Malach
Cyril Zhang
471
173
0
18 Jul 2022
N-Grammer: Augmenting Transformers with latent n-grams
Aurko Roy
Rohan Anil
Guangda Lai
Benjamin Lee
Jeffrey Zhao
...
Yu
Phuong Dao
Christopher Fifty
Zhiwen Chen
Yonghui Wu
220
10
0
13 Jul 2022
Hamiltonian Monte Carlo Particle Swarm Optimizer
Omatharv Bharat Vaidya
Rithvik Terence DSouza
Snehanshu Saha
S. Dhavala
Swagatam Das
311
0
0
08 May 2022
Adaptive Gradient Methods with Local Guarantees
Zhou Lu
Wenhan Xia
Sanjeev Arora
Elad Hazan
ODL
600
12
0
02 Mar 2022
Understanding AdamW through Proximal Methods and Scale-Freeness
Zhenxun Zhuang
Mingrui Liu
Ashok Cutkosky
Francesco Orabona
338
110
0
31 Jan 2022
Analyzing Monotonic Linear Interpolation in Neural Network Loss Landscapes
James Lucas
Juhan Bae
Michael Ruogu Zhang
Stanislav Fort
R. Zemel
Roger C. Grosse
MoMe
552
32
0
22 Apr 2021
How to decay your learning rate
Aitor Lewkowycz
357
30
0
23 Mar 2021
Acceleration via Fractal Learning Rate Schedules
International Conference on Machine Learning (ICML), 2021
Naman Agarwal
Surbhi Goel
Cyril Zhang
228
19
0
01 Mar 2021
A Large Batch Optimizer Reality Check: Traditional, Generic Optimizers Suffice Across Batch Sizes
Zachary Nado
Justin M. Gilmer
Christopher J. Shallue
Rohan Anil
George E. Dahl
ODL
299
31
0
12 Feb 2021
Descending through a Crowded Valley - Benchmarking Deep Learning Optimizers
Robin M. Schmidt
Frank Schneider
Philipp Hennig
ODL
916
195
0
03 Jul 2020
Shape Matters: Understanding the Implicit Bias of the Noise Covariance
Jeff Z. HaoChen
Colin Wei
Jason D. Lee
Tengyu Ma
697
114
0
15 Jun 2020
Accelerated Learning with Robustness to Adversarial Regressors
Conference on Learning for Dynamics & Control (L4DC), 2020
Joseph E. Gaudio
Anuradha M. Annaswamy
J. Moreu
M. Bolender
T. Gibson
385
21
0
04 May 2020
1
Page 1 of 1