ResearchTrend.AI
  • Papers
  • Communities
  • Organizations
  • Events
  • Blog
  • Pricing
  • Feedback
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1808.05671
  4. Cited By
On the Convergence of Adaptive Gradient Methods for Nonconvex
  Optimization
v1v2v3v4 (latest)

On the Convergence of Adaptive Gradient Methods for Nonconvex Optimization

16 August 2018
Dongruo Zhou
Yiqi Tang
Yuan Cao
Ziyan Yang
Quanquan Gu
ArXiv (abs)PDFHTML

Papers citing "On the Convergence of Adaptive Gradient Methods for Nonconvex Optimization"

50 / 107 papers shown
Title
On the Convergence of Muon and Beyond
On the Convergence of Muon and Beyond
Da Chang
Yongxiang Liu
Ganzhao Yuan
11
0
0
19 Sep 2025
Adaptive Preconditioners Trigger Loss Spikes in Adam
Zhiwei Bai
Zhangchen Zhou
Jiajie Zhao
Xiaolong Li
Zhiyu Li
Feiyu Xiong
Hongkang Yang
Yaoyu Zhang
Z. Xu
ODL
152
0
0
05 Jun 2025
Unified Scaling Laws for Compressed Representations
Unified Scaling Laws for Compressed Representations
Andrei Panferov
Alexandra Volkova
Ionut-Vlad Modoranu
Vage Egiazarian
M. Safaryan
Dan Alistarh
109
0
0
02 Jun 2025
LightSAM: Parameter-Agnostic Sharpness-Aware Minimization
LightSAM: Parameter-Agnostic Sharpness-Aware Minimization
Yifei Cheng
Li Shen
Hao Sun
Nan Yin
Xiaochun Cao
Enhong Chen
AAML
90
0
0
30 May 2025
Temporal Context Consistency Above All: Enhancing Long-Term Anticipation
  by Learning and Enforcing Temporal Constraints
Temporal Context Consistency Above All: Enhancing Long-Term Anticipation by Learning and Enforcing Temporal Constraints
Alberto Maté
Mariella Dimiccoli
AI4TS
121
0
0
27 Dec 2024
Attribute Inference Attacks for Federated Regression Tasks
Attribute Inference Attacks for Federated Regression Tasks
Francesco Diana
Othmane Marfoq
Chuan Xu
Giovanni Neglia
F. Giroire
Eoin Thomas
AAML
742
1
0
19 Nov 2024
Understanding Adam Requires Better Rotation Dependent Assumptions
Understanding Adam Requires Better Rotation Dependent Assumptions
Lucas Maes
Tianyue H. Zhang
Alexia Jolicoeur-Martineau
Ioannis Mitliagkas
Damien Scieur
Simon Lacoste-Julien
Charles Guille-Escuret
102
3
0
25 Oct 2024
LDAdam: Adaptive Optimization from Low-Dimensional Gradient Statistics
LDAdam: Adaptive Optimization from Low-Dimensional Gradient Statistics
Thomas Robert
M. Safaryan
Ionut-Vlad Modoranu
Dan Alistarh
ODL
185
12
0
21 Oct 2024
Attack Anything: Blind DNNs via Universal Background Adversarial Attack
Attack Anything: Blind DNNs via Universal Background Adversarial Attack
Jiawei Lian
Shaohui Mei
X. Wang
Yi Wang
L. Wang
Yingjie Lu
Mingyang Ma
Lap-Pui Chau
AAML
161
2
0
17 Aug 2024
The Implicit Bias of Adam on Separable Data
The Implicit Bias of Adam on Separable Data
Chenyang Zhang
Difan Zou
Yuan Cao
AI4CE
138
10
0
15 Jun 2024
Provable Complexity Improvement of AdaGrad over SGD: Upper and Lower Bounds in Stochastic Non-Convex Optimization
Provable Complexity Improvement of AdaGrad over SGD: Upper and Lower Bounds in Stochastic Non-Convex Optimization
Devyani Maladkar
Ruichen Jiang
Aryan Mokhtari
178
3
0
07 Jun 2024
Achieving Near-Optimal Convergence for Distributed Minimax Optimization
  with Adaptive Stepsizes
Achieving Near-Optimal Convergence for Distributed Minimax Optimization with Adaptive Stepsizes
Yan Huang
Xiang Li
Yipeng Shen
Niao He
Jinming Xu
128
2
0
05 Jun 2024
MicroAdam: Accurate Adaptive Optimization with Low Space Overhead and
  Provable Convergence
MicroAdam: Accurate Adaptive Optimization with Low Space Overhead and Provable Convergence
Ionut-Vlad Modoranu
M. Safaryan
Grigory Malinovsky
Eldar Kurtic
Thomas Robert
Peter Richtárik
Dan Alistarh
MQ
115
18
0
24 May 2024
Conjugate-Gradient-like Based Adaptive Moment Estimation Optimization Algorithm for Deep Learning
Conjugate-Gradient-like Based Adaptive Moment Estimation Optimization Algorithm for Deep Learning
Jiawu Tian
Liwei Xu
Xiaowei Zhang
Yongqi Li
ODL
207
0
0
02 Apr 2024
Regularized DeepIV with Model Selection
Regularized DeepIV with Model Selection
Zihao Li
Hui Lan
Vasilis Syrgkanis
Mengdi Wang
Masatoshi Uehara
146
2
0
07 Mar 2024
Why Transformers Need Adam: A Hessian Perspective
Why Transformers Need Adam: A Hessian Perspective
Yushun Zhang
Congliang Chen
Tian Ding
Ziniu Li
Ruoyu Sun
Zhimin Luo
185
61
0
26 Feb 2024
Revisiting Convergence of AdaGrad with Relaxed Assumptions
Revisiting Convergence of AdaGrad with Relaxed Assumptions
Yusu Hong
Junhong Lin
94
13
0
21 Feb 2024
AdAdaGrad: Adaptive Batch Size Schemes for Adaptive Gradient Methods
AdAdaGrad: Adaptive Batch Size Schemes for Adaptive Gradient Methods
Tim Tsz-Kit Lau
Han Liu
Mladen Kolar
ODL
130
8
0
17 Feb 2024
Towards Quantifying the Preconditioning Effect of Adam
Towards Quantifying the Preconditioning Effect of Adam
Rudrajit Das
Naman Agarwal
Sujay Sanghavi
Inderjit S. Dhillon
53
7
0
11 Feb 2024
On Convergence of Adam for Stochastic Optimization under Relaxed Assumptions
On Convergence of Adam for Stochastic Optimization under Relaxed Assumptions
Yusu Hong
Junhong Lin
195
15
0
06 Feb 2024
Momentum Does Not Reduce Stochastic Noise in Stochastic Gradient Descent
Momentum Does Not Reduce Stochastic Noise in Stochastic Gradient Descent
Naoki Sato
Hideaki Iiduka
ODL
130
1
0
04 Feb 2024
Probabilistic Guarantees of Stochastic Recursive Gradient in Non-Convex
  Finite Sum Problems
Probabilistic Guarantees of Stochastic Recursive Gradient in Non-Convex Finite Sum Problems
Yanjie Zhong
Jiaqi Li
Soumendra Lahiri
83
1
0
29 Jan 2024
AGD: an Auto-switchable Optimizer using Stepwise Gradient Difference for
  Preconditioning Matrix
AGD: an Auto-switchable Optimizer using Stepwise Gradient Difference for Preconditioning Matrix
Yun Yue
Zhiling Ye
Jiadi Jiang
Yongchao Liu
Ke Zhang
ODL
136
1
0
04 Dec 2023
Using Stochastic Gradient Descent to Smooth Nonconvex Functions:
  Analysis of Implicit Graduated Optimization with Optimal Noise Scheduling
Using Stochastic Gradient Descent to Smooth Nonconvex Functions: Analysis of Implicit Graduated Optimization with Optimal Noise Scheduling
Naoki Sato
Hideaki Iiduka
134
4
0
15 Nov 2023
High Probability Convergence of Adam Under Unbounded Gradients and
  Affine Variance Noise
High Probability Convergence of Adam Under Unbounded Gradients and Affine Variance Noise
Yusu Hong
Junhong Lin
97
11
0
03 Nov 2023
Demystifying the Myths and Legends of Nonconvex Convergence of SGD
Demystifying the Myths and Legends of Nonconvex Convergence of SGD
Aritra Dutta
El Houcine Bergou
Soumia Boucherouite
Nicklas Werge
M. Kandemir
Xin Li
89
0
0
19 Oct 2023
FedLALR: Client-Specific Adaptive Learning Rates Achieve Linear Speedup
  for Non-IID Data
FedLALR: Client-Specific Adaptive Learning Rates Achieve Linear Speedup for Non-IID Data
Hao Sun
Li Shen
Shi-Yong Chen
Jingwei Sun
Jing Li
Guangzhong Sun
Dacheng Tao
FedML
111
2
0
18 Sep 2023
DRAG: Divergence-based Adaptive Aggregation in Federated learning on
  Non-IID Data
DRAG: Divergence-based Adaptive Aggregation in Federated learning on Non-IID Data
Feng Zhu
Jingjing Zhang
Shengyun Liu
Xin Eric Wang
FedML
108
1
0
04 Sep 2023
Efficient Federated Learning via Local Adaptive Amended Optimizer with
  Linear Speedup
Efficient Federated Learning via Local Adaptive Amended Optimizer with Linear Speedup
Yan Sun
Li Shen
Hao Sun
Liang Ding
Dacheng Tao
FedML
108
18
0
30 Jul 2023
High Probability Analysis for Non-Convex Stochastic Optimization with
  Clipping
High Probability Analysis for Non-Convex Stochastic Optimization with Clipping
Shaojie Li
Yong Liu
118
4
0
25 Jul 2023
Toward Understanding Why Adam Converges Faster Than SGD for Transformers
Toward Understanding Why Adam Converges Faster Than SGD for Transformers
Yan Pan
Yuanzhi Li
172
48
0
31 May 2023
Two Sides of One Coin: the Limits of Untuned SGD and the Power of
  Adaptive Methods
Two Sides of One Coin: the Limits of Untuned SGD and the Power of Adaptive Methods
Junchi Yang
Xiang Li
Ilyas Fatkhullin
Niao He
110
20
0
21 May 2023
Towards Understanding the Generalization of Graph Neural Networks
Towards Understanding the Generalization of Graph Neural Networks
Huayi Tang
Y. Liu
GNNAI4CE
124
44
0
14 May 2023
UAdam: Unified Adam-Type Algorithmic Framework for Non-Convex Stochastic
  Optimization
UAdam: Unified Adam-Type Algorithmic Framework for Non-Convex Stochastic Optimization
Yiming Jiang
Jinlan Liu
Dongpo Xu
Danilo Mandic
63
4
0
09 May 2023
Convergence of Adam Under Relaxed Assumptions
Convergence of Adam Under Relaxed Assumptions
Haochuan Li
Alexander Rakhlin
Ali Jadbabaie
187
82
0
27 Apr 2023
AdaSAM: Boosting Sharpness-Aware Minimization with Adaptive Learning
  Rate and Momentum for Training Deep Neural Networks
AdaSAM: Boosting Sharpness-Aware Minimization with Adaptive Learning Rate and Momentum for Training Deep Neural Networks
Hao Sun
Li Shen
Qihuang Zhong
Liang Ding
Shi-Yong Chen
Jingwei Sun
Jing Li
Guangzhong Sun
Dacheng Tao
118
38
0
01 Mar 2023
SGD with AdaGrad Stepsizes: Full Adaptivity with High Probability to
  Unknown Parameters, Unbounded Gradients and Affine Variance
SGD with AdaGrad Stepsizes: Full Adaptivity with High Probability to Unknown Parameters, Unbounded Gradients and Affine Variance
Amit Attia
Tomer Koren
ODL
142
29
0
17 Feb 2023
Multilevel Objective-Function-Free Optimization with an Application to
  Neural Networks Training
Multilevel Objective-Function-Free Optimization with an Application to Neural Networks Training
Serge Gratton
Alena Kopanicáková
P. Toint
94
9
0
14 Feb 2023
FedDA: Faster Framework of Local Adaptive Gradient Methods via Restarted
  Dual Averaging
FedDA: Faster Framework of Local Adaptive Gradient Methods via Restarted Dual Averaging
Junyi Li
Feihu Huang
Heng-Chiao Huang
FedML
116
1
0
13 Feb 2023
Analysis of Error Feedback in Federated Non-Convex Optimization with
  Biased Compression
Analysis of Error Feedback in Federated Non-Convex Optimization with Biased Compression
Xiaoyun Li
Ping Li
FedML
108
6
0
25 Nov 2022
On the Algorithmic Stability and Generalization of Adaptive Optimization
  Methods
On the Algorithmic Stability and Generalization of Adaptive Optimization Methods
Han Nguyen
Hai Pham
Sashank J. Reddi
Barnabas Poczos
ODLAI4CE
129
2
0
08 Nov 2022
TiAda: A Time-scale Adaptive Algorithm for Nonconvex Minimax
  Optimization
TiAda: A Time-scale Adaptive Algorithm for Nonconvex Minimax Optimization
Xiang Li
Junchi Yang
Niao He
99
10
0
31 Oct 2022
Local Model Reconstruction Attacks in Federated Learning and their Uses
Ilias Driouich
Chuan Xu
Giovanni Neglia
F. Giroire
Eoin Thomas
AAMLFedML
131
3
0
28 Oct 2022
Communication-Efficient Adam-Type Algorithms for Distributed Data Mining
Communication-Efficient Adam-Type Algorithms for Distributed Data Mining
Wenhan Xian
Feihu Huang
Heng-Chiao Huang
FedML
92
1
0
14 Oct 2022
Dissecting adaptive methods in GANs
Dissecting adaptive methods in GANs
Samy Jelassi
David Dobre
A. Mensch
Yuanzhi Li
Gauthier Gidel
76
4
0
09 Oct 2022
Provable Adaptivity of Adam under Non-uniform Smoothness
Provable Adaptivity of Adam under Non-uniform Smoothness
Bohan Wang
Yushun Zhang
Huishuai Zhang
Qi Meng
Ruoyu Sun
Zhirui Ma
Tie-Yan Liu
Zhimin Luo
Wei Chen
117
28
0
21 Aug 2022
Critical Bach Size Minimizes Stochastic First-Order Oracle Complexity of
  Deep Learning Optimizer using Hyperparameters Close to One
Critical Bach Size Minimizes Stochastic First-Order Oracle Complexity of Deep Learning Optimizer using Hyperparameters Close to One
Hideaki Iiduka
ODL
67
5
0
21 Aug 2022
Adam Can Converge Without Any Modification On Update Rules
Adam Can Converge Without Any Modification On Update Rules
Yushun Zhang
Congliang Chen
Naichen Shi
Ruoyu Sun
Zhimin Luo
175
77
0
20 Aug 2022
Adan: Adaptive Nesterov Momentum Algorithm for Faster Optimizing Deep
  Models
Adan: Adaptive Nesterov Momentum Algorithm for Faster Optimizing Deep Models
Xingyu Xie
Pan Zhou
Huan Li
Zhouchen Lin
Shuicheng Yan
ODL
212
194
0
13 Aug 2022
Adaptive Gradient Methods at the Edge of Stability
Adaptive Gradient Methods at the Edge of Stability
Jeremy M. Cohen
Behrooz Ghorbani
Shankar Krishnan
Naman Agarwal
Sourabh Medapati
...
Daniel Suo
David E. Cardoze
Zachary Nado
George E. Dahl
Justin Gilmer
ODL
182
57
0
29 Jul 2022
123
Next