ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1705.07774
  4. Cited By
Dissecting Adam: The Sign, Magnitude and Variance of Stochastic
  Gradients

Dissecting Adam: The Sign, Magnitude and Variance of Stochastic Gradients

22 May 2017
Lukas Balles
Philipp Hennig
ArXivPDFHTML

Papers citing "Dissecting Adam: The Sign, Magnitude and Variance of Stochastic Gradients"

30 / 30 papers shown
Title
Understanding Why Adam Outperforms SGD: Gradient Heterogeneity in Transformers
Understanding Why Adam Outperforms SGD: Gradient Heterogeneity in Transformers
Akiyoshi Tomihari
Issei Sato
ODL
61
0
0
31 Jan 2025
Distributed Sign Momentum with Local Steps for Training Transformers
Distributed Sign Momentum with Local Steps for Training Transformers
Shuhua Yu
Ding Zhou
Cong Xie
An Xu
Zhi-Li Zhang
Xin Liu
S. Kar
66
0
0
26 Nov 2024
Deconstructing What Makes a Good Optimizer for Language Models
Deconstructing What Makes a Good Optimizer for Language Models
Rosie Zhao
Depen Morwani
David Brandfonbrener
Nikhil Vyas
Sham Kakade
44
17
0
10 Jul 2024
Implicit Bias of AdamW: $\ell_\infty$ Norm Constrained Optimization
Implicit Bias of AdamW: ℓ∞\ell_\inftyℓ∞​ Norm Constrained Optimization
Shuo Xie
Zhiyuan Li
OffRL
39
12
0
05 Apr 2024
SignSGD with Federated Voting
SignSGD with Federated Voting
Chanho Park
H. Vincent Poor
Namyoon Lee
FedML
40
1
0
25 Mar 2024
Heavy-Tailed Class Imbalance and Why Adam Outperforms Gradient Descent
  on Language Models
Heavy-Tailed Class Imbalance and Why Adam Outperforms Gradient Descent on Language Models
Frederik Kunstner
Robin Yadav
Alan Milligan
Mark Schmidt
Alberto Bietti
36
26
0
29 Feb 2024
Convergence of Sign-based Random Reshuffling Algorithms for Nonconvex
  Optimization
Convergence of Sign-based Random Reshuffling Algorithms for Nonconvex Optimization
Zhen Qin
Zhishuai Liu
Pan Xu
18
1
0
24 Oct 2023
Lion Secretly Solves Constrained Optimization: As Lyapunov Predicts
Lion Secretly Solves Constrained Optimization: As Lyapunov Predicts
Lizhang Chen
Bo Liu
Kaizhao Liang
Qian Liu
ODL
19
15
0
09 Oct 2023
Understanding the robustness difference between stochastic gradient
  descent and adaptive gradient methods
Understanding the robustness difference between stochastic gradient descent and adaptive gradient methods
A. Ma
Yangchen Pan
Amir-massoud Farahmand
AAML
25
5
0
13 Aug 2023
Sophia: A Scalable Stochastic Second-order Optimizer for Language Model
  Pre-training
Sophia: A Scalable Stochastic Second-order Optimizer for Language Model Pre-training
Hong Liu
Zhiyuan Li
David Leo Wright Hall
Percy Liang
Tengyu Ma
VLM
32
128
0
23 May 2023
Two Sides of One Coin: the Limits of Untuned SGD and the Power of
  Adaptive Methods
Two Sides of One Coin: the Limits of Untuned SGD and the Power of Adaptive Methods
Junchi Yang
Xiang Li
Ilyas Fatkhullin
Niao He
34
15
0
21 May 2023
Symbolic Discovery of Optimization Algorithms
Symbolic Discovery of Optimization Algorithms
Xiangning Chen
Chen Liang
Da Huang
Esteban Real
Kaiyuan Wang
...
Xuanyi Dong
Thang Luong
Cho-Jui Hsieh
Yifeng Lu
Quoc V. Le
55
350
0
13 Feb 2023
A Deep Learning Approach to Generating Photospheric Vector Magnetograms
  of Solar Active Regions for SOHO/MDI Using SDO/HMI and BBSO Data
A Deep Learning Approach to Generating Photospheric Vector Magnetograms of Solar Active Regions for SOHO/MDI Using SDO/HMI and BBSO Data
Haodi Jiang
Qin Li
Zhihang Hu
Nian Liu
Yasser Abduallah
...
Genwei Zhang
Yan Xu
Wynne Hsu
J. T. Wang
Haimin Wang
32
6
0
04 Nov 2022
An Empirical Evaluation of Zeroth-Order Optimization Methods on
  AI-driven Molecule Optimization
An Empirical Evaluation of Zeroth-Order Optimization Methods on AI-driven Molecule Optimization
Elvin Lo
Pin-Yu Chen
26
0
0
27 Oct 2022
Momentum Diminishes the Effect of Spectral Bias in Physics-Informed
  Neural Networks
Momentum Diminishes the Effect of Spectral Bias in Physics-Informed Neural Networks
G. Farhani
Alexander Kazachek
Boyu Wang
19
6
0
29 Jun 2022
Logit Normalization for Long-tail Object Detection
Logit Normalization for Long-tail Object Detection
Liang Zhao
Yao Teng
Limin Wang
26
10
0
31 Mar 2022
A DNN Optimizer that Improves over AdaBelief by Suppression of the
  Adaptive Stepsize Range
A DNN Optimizer that Improves over AdaBelief by Suppression of the Adaptive Stepsize Range
Guoqiang Zhang
Kenta Niwa
W. Kleijn
ODL
13
2
0
24 Mar 2022
Maximizing Communication Efficiency for Large-scale Training via 0/1
  Adam
Maximizing Communication Efficiency for Large-scale Training via 0/1 Adam
Yucheng Lu
Conglong Li
Minjia Zhang
Christopher De Sa
Yuxiong He
OffRL
AI4CE
24
20
0
12 Feb 2022
Understanding the Generalization of Adam in Learning Neural Networks
  with Proper Regularization
Understanding the Generalization of Adam in Learning Neural Networks with Proper Regularization
Difan Zou
Yuan Cao
Yuanzhi Li
Quanquan Gu
MLT
AI4CE
44
38
0
25 Aug 2021
GradInit: Learning to Initialize Neural Networks for Stable and
  Efficient Training
GradInit: Learning to Initialize Neural Networks for Stable and Efficient Training
Chen Zhu
Renkun Ni
Zheng Xu
Kezhi Kong
W. R. Huang
Tom Goldstein
ODL
41
53
0
16 Feb 2021
A Qualitative Study of the Dynamic Behavior for Adaptive Gradient
  Algorithms
A Qualitative Study of the Dynamic Behavior for Adaptive Gradient Algorithms
Chao Ma
Lei Wu
E. Weinan
ODL
11
23
0
14 Sep 2020
AdaScale SGD: A User-Friendly Algorithm for Distributed Training
AdaScale SGD: A User-Friendly Algorithm for Distributed Training
Tyler B. Johnson
Pulkit Agrawal
Haijie Gu
Carlos Guestrin
ODL
21
37
0
09 Jul 2020
An Analysis of Constant Step Size SGD in the Non-convex Regime:
  Asymptotic Normality and Bias
An Analysis of Constant Step Size SGD in the Non-convex Regime: Asymptotic Normality and Bias
Lu Yu
Krishnakumar Balasubramanian
S. Volgushev
Murat A. Erdogdu
32
50
0
14 Jun 2020
LaProp: Separating Momentum and Adaptivity in Adam
LaProp: Separating Momentum and Adaptivity in Adam
Liu Ziyin
Zhikang T.Wang
Masahito Ueda
ODL
6
18
0
12 Feb 2020
Limitations of the Empirical Fisher Approximation for Natural Gradient
  Descent
Limitations of the Empirical Fisher Approximation for Natural Gradient Descent
Frederik Kunstner
Lukas Balles
Philipp Hennig
21
207
0
29 May 2019
A Sufficient Condition for Convergences of Adam and RMSProp
A Sufficient Condition for Convergences of Adam and RMSProp
Fangyu Zou
Li Shen
Zequn Jie
Weizhong Zhang
Wei Liu
19
362
0
23 Nov 2018
signSGD with Majority Vote is Communication Efficient And Fault Tolerant
signSGD with Majority Vote is Communication Efficient And Fault Tolerant
Jeremy Bernstein
Jiawei Zhao
Kamyar Azizzadenesheli
Anima Anandkumar
FedML
23
46
0
11 Oct 2018
signSGD: Compressed Optimisation for Non-Convex Problems
signSGD: Compressed Optimisation for Non-Convex Problems
Jeremy Bernstein
Yu-Xiang Wang
Kamyar Azizzadenesheli
Anima Anandkumar
FedML
ODL
35
1,019
0
13 Feb 2018
On Large-Batch Training for Deep Learning: Generalization Gap and Sharp
  Minima
On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima
N. Keskar
Dheevatsa Mudigere
J. Nocedal
M. Smelyanskiy
P. T. P. Tang
ODL
281
2,889
0
15 Sep 2016
Linear Convergence of Gradient and Proximal-Gradient Methods Under the
  Polyak-Łojasiewicz Condition
Linear Convergence of Gradient and Proximal-Gradient Methods Under the Polyak-Łojasiewicz Condition
Hamed Karimi
J. Nutini
Mark W. Schmidt
139
1,199
0
16 Aug 2016
1