Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2003.09729
Cited By
A new regret analysis for Adam-type algorithms
International Conference on Machine Learning (ICML), 2020
21 March 2020
Ahmet Alacaoglu
Yura Malitsky
P. Mertikopoulos
Volkan Cevher
ODL
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"A new regret analysis for Adam-type algorithms"
34 / 34 papers shown
Scale-Invariant Regret Matching and Online Learning with Optimal Convergence: Bridging Theory and Practice in Zero-Sum Games
B. Zhang
Ioannis Anagnostides
Tuomas Sandholm
210
2
0
06 Oct 2025
How to Set
β
1
,
β
2
β_1, β_2
β
1
,
β
2
in Adam: An Online Learning Perspective
Quan Nguyen
OffRL
343
0
0
03 Oct 2025
In Search of Adam's Secret Sauce
Antonio Orvieto
Robert Gower
451
26
0
27 May 2025
Learning Rate Annealing Improves Tuning Robustness in Stochastic Optimization
Amit Attia
Tomer Koren
470
2
0
12 Mar 2025
SPAM: Spike-Aware Adam with Momentum Reset for Stable LLM Training
International Conference on Learning Representations (ICLR), 2025
Tianjin Huang
Ziquan Zhu
Gaojie Jin
Lu Liu
Zinan Lin
Shiwei Liu
538
22
0
12 Jan 2025
Temporal Context Consistency Above All: Enhancing Long-Term Anticipation by Learning and Enforcing Temporal Constraints
Alberto Maté
Mariella Dimiccoli
AI4TS
379
2
0
27 Dec 2024
CAdam: Confidence-Based Optimization for Online Learning
Shaowen Wang
Anan Liu
Jian Xiao
Huan Liu
Yuekui Yang
...
Suncong Zheng
Wei-Qiang Zhang
Di Wang
Jie Jiang
Jian Li
417
2
0
29 Nov 2024
Provable Complexity Improvement of AdaGrad over SGD: Upper and Lower Bounds in Stochastic Non-Convex Optimization
Annual Conference Computational Learning Theory (COLT), 2024
Devyani Maladkar
Ruichen Jiang
Aryan Mokhtari
524
6
0
07 Jun 2024
How Free is Parameter-Free Stochastic Optimization?
International Conference on Machine Learning (ICML), 2024
Amit Attia
Tomer Koren
ODL
495
11
0
05 Feb 2024
Understanding Adam Optimizer via Online Learning of Updates: Adam is FTRL in Disguise
Kwangjun Ahn
Zhiyu Zhang
Yunbum Kook
Yan Dai
349
24
0
02 Feb 2024
Bidirectional Looking with A Novel Double Exponential Moving Average to Adaptive and Non-adaptive Momentum Optimizers
International Conference on Machine Learning (ICML), 2023
Yineng Chen
Z. Li
Lefei Zhang
Bo Du
Hai Zhao
215
8
0
02 Jul 2023
Noise Is Not the Main Factor Behind the Gap Between SGD and Adam on Transformers, but Sign Descent Might Be
International Conference on Learning Representations (ICLR), 2023
Frederik Kunstner
Jacques Chen
J. Lavington
Mark Schmidt
355
111
0
27 Apr 2023
SGD with AdaGrad Stepsizes: Full Adaptivity with High Probability to Unknown Parameters, Unbounded Gradients and Affine Variance
International Conference on Machine Learning (ICML), 2023
Amit Attia
Tomer Koren
ODL
311
32
0
17 Feb 2023
Differentially Private Adaptive Optimization with Delayed Preconditioners
International Conference on Learning Representations (ICLR), 2022
Tian Li
Manzil Zaheer
Ziyu Liu
Sashank J. Reddi
H. B. McMahan
Virginia Smith
319
17
0
01 Dec 2022
Communication-Efficient Adam-Type Algorithms for Distributed Data Mining
Industrial Conference on Data Mining (IDM), 2022
Wenhan Xian
Feihu Huang
Heng-Chiao Huang
FedML
173
1
0
14 Oct 2022
Divergence Results and Convergence of a Variance Reduced Version of ADAM
Ruiqi Wang
Diego Klabjan
247
9
0
11 Oct 2022
Dynamic Regret of Adaptive Gradient Methods for Strongly Convex Problems
Parvin Nazari
E. Khorram
ODL
286
5
0
04 Sep 2022
Optimistic Optimisation of Composite Objective with Exponentiated Update
Machine-mediated learning (ML), 2022
Weijia Shao
F. Sivrikaya
S. Albayrak
430
4
0
08 Aug 2022
High Probability Bounds for a Class of Nonconvex Algorithms with AdaGrad Stepsize
International Conference on Learning Representations (ICLR), 2022
Ali Kavis
Kfir Y. Levy
Volkan Cevher
250
49
0
06 Apr 2022
Analysis of Dual-Based PID Controllers through Convolutional Mirror Descent
S. Balseiro
Haihao Lu
Vahab Mirrokni
Balasubramanian Sivan
407
7
0
12 Feb 2022
Maximizing Communication Efficiency for Large-scale Training via 0/1 Adam
International Conference on Learning Representations (ICLR), 2022
Yucheng Lu
Conglong Li
Minjia Zhang
Christopher De Sa
Yuxiong He
OffRL
AI4CE
417
22
0
12 Feb 2022
AdaTerm: Adaptive T-Distribution Estimated Robust Moments for Noise-Robust Stochastic Gradient Optimization
Neurocomputing (Neurocomputing), 2022
Wendyam Eric Lionel Ilboudo
Taisuke Kobayashi
Takamitsu Matsubara
413
18
0
18 Jan 2022
Communication-Compressed Adaptive Gradient Method for Distributed Nonconvex Optimization
International Conference on Artificial Intelligence and Statistics (AISTATS), 2021
Yujia Wang
Lu Lin
Jinghui Chen
391
21
0
01 Nov 2021
Momentum Centering and Asynchronous Update for Adaptive Gradient Methods
Neural Information Processing Systems (NeurIPS), 2021
Juntang Zhuang
Yifan Ding
Tommy M. Tang
Nicha Dvornek
S. Tatikonda
James S. Duncan
ODL
295
7
0
11 Oct 2021
A New Adaptive Gradient Method with Gradient Decomposition
Machine-mediated learning (ML), 2021
Zhou Shao
Tong Lin
ODL
154
2
0
18 Jul 2021
AdaL: Adaptive Gradient Transformation Contributes to Convergences and Generalizations
Hongwei Zhang
Weidong Zou
Hongbo Zhao
Qi Ming
Tijin Yan
Yuanqing Xia
Weipeng Cao
ODL
124
0
0
04 Jul 2021
The Role of Momentum Parameters in the Optimal Convergence of Adaptive Polyak's Heavy-ball Methods
International Conference on Learning Representations (ICLR), 2021
Wei Tao
Sheng Long
Gao-wei Wu
Qing Tao
171
17
0
15 Feb 2021
On the Last Iterate Convergence of Momentum Methods
International Conference on Algorithmic Learning Theory (ALT), 2021
Xiaoyun Li
Mingrui Liu
Francesco Orabona
435
12
0
13 Feb 2021
1-bit Adam: Communication Efficient Large-Scale Training with Adam's Convergence Speed
International Conference on Machine Learning (ICML), 2021
Hanlin Tang
Shaoduo Gan
A. A. Awan
Samyam Rajbhandari
Conglong Li
Xiangru Lian
Ji Liu
Ce Zhang
Yuxiong He
AI4CE
335
103
0
04 Feb 2021
Stochastic optimization with momentum: convergence, fluctuations, and traps avoidance
Anas Barakat
Pascal Bianchi
W. Hachem
S. Schechtman
360
16
0
07 Dec 2020
A Modular Analysis of Provable Acceleration via Polyak's Momentum: Training a Wide ReLU Network and a Deep Linear Network
International Conference on Machine Learning (ICML), 2020
Jun-Kun Wang
Chi-Heng Lin
Jacob D. Abernethy
726
26
0
04 Oct 2020
Adaptive Gradient Methods Converge Faster with Over-Parameterization (but you should do a line-search)
Sharan Vaswani
I. Laradji
Frederik Kunstner
S. Meng
Mark Schmidt
Damien Scieur
370
30
0
11 Jun 2020
Convergence of adaptive algorithms for weakly convex constrained optimization
Ahmet Alacaoglu
Yura Malitsky
Volkan Cevher
240
14
0
11 Jun 2020
On the Convergence of Adaptive Gradient Methods for Nonconvex Optimization
Dongruo Zhou
Yiqi Tang
Yuan Cao
Ziyan Yang
Quanquan Gu
522
161
0
16 Aug 2018
1
Page 1 of 1