ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2003.09729
  4. Cited By
A new regret analysis for Adam-type algorithms

A new regret analysis for Adam-type algorithms

International Conference on Machine Learning (ICML), 2020
21 March 2020
Ahmet Alacaoglu
Yura Malitsky
P. Mertikopoulos
Volkan Cevher
    ODL
ArXiv (abs)PDFHTML

Papers citing "A new regret analysis for Adam-type algorithms"

34 / 34 papers shown
Scale-Invariant Regret Matching and Online Learning with Optimal Convergence: Bridging Theory and Practice in Zero-Sum Games
Scale-Invariant Regret Matching and Online Learning with Optimal Convergence: Bridging Theory and Practice in Zero-Sum Games
B. Zhang
Ioannis Anagnostides
Tuomas Sandholm
210
2
0
06 Oct 2025
How to Set $β_1, β_2$ in Adam: An Online Learning Perspective
How to Set β1,β2β_1, β_2β1​,β2​ in Adam: An Online Learning Perspective
Quan Nguyen
OffRL
343
0
0
03 Oct 2025
In Search of Adam's Secret Sauce
In Search of Adam's Secret Sauce
Antonio Orvieto
Robert Gower
451
26
0
27 May 2025
Learning Rate Annealing Improves Tuning Robustness in Stochastic Optimization
Learning Rate Annealing Improves Tuning Robustness in Stochastic Optimization
Amit Attia
Tomer Koren
470
2
0
12 Mar 2025
SPAM: Spike-Aware Adam with Momentum Reset for Stable LLM Training
SPAM: Spike-Aware Adam with Momentum Reset for Stable LLM TrainingInternational Conference on Learning Representations (ICLR), 2025
Tianjin Huang
Ziquan Zhu
Gaojie Jin
Lu Liu
Zinan Lin
Shiwei Liu
538
22
0
12 Jan 2025
Temporal Context Consistency Above All: Enhancing Long-Term Anticipation
  by Learning and Enforcing Temporal Constraints
Temporal Context Consistency Above All: Enhancing Long-Term Anticipation by Learning and Enforcing Temporal Constraints
Alberto Maté
Mariella Dimiccoli
AI4TS
379
2
0
27 Dec 2024
CAdam: Confidence-Based Optimization for Online Learning
CAdam: Confidence-Based Optimization for Online Learning
Shaowen Wang
Anan Liu
Jian Xiao
Huan Liu
Yuekui Yang
...
Suncong Zheng
Wei-Qiang Zhang
Di Wang
Jie Jiang
Jian Li
417
2
0
29 Nov 2024
Provable Complexity Improvement of AdaGrad over SGD: Upper and Lower Bounds in Stochastic Non-Convex Optimization
Provable Complexity Improvement of AdaGrad over SGD: Upper and Lower Bounds in Stochastic Non-Convex OptimizationAnnual Conference Computational Learning Theory (COLT), 2024
Devyani Maladkar
Ruichen Jiang
Aryan Mokhtari
524
6
0
07 Jun 2024
How Free is Parameter-Free Stochastic Optimization?
How Free is Parameter-Free Stochastic Optimization?International Conference on Machine Learning (ICML), 2024
Amit Attia
Tomer Koren
ODL
495
11
0
05 Feb 2024
Understanding Adam Optimizer via Online Learning of Updates: Adam is
  FTRL in Disguise
Understanding Adam Optimizer via Online Learning of Updates: Adam is FTRL in Disguise
Kwangjun Ahn
Zhiyu Zhang
Yunbum Kook
Yan Dai
349
24
0
02 Feb 2024
Bidirectional Looking with A Novel Double Exponential Moving Average to
  Adaptive and Non-adaptive Momentum Optimizers
Bidirectional Looking with A Novel Double Exponential Moving Average to Adaptive and Non-adaptive Momentum OptimizersInternational Conference on Machine Learning (ICML), 2023
Yineng Chen
Z. Li
Lefei Zhang
Bo Du
Hai Zhao
215
8
0
02 Jul 2023
Noise Is Not the Main Factor Behind the Gap Between SGD and Adam on
  Transformers, but Sign Descent Might Be
Noise Is Not the Main Factor Behind the Gap Between SGD and Adam on Transformers, but Sign Descent Might BeInternational Conference on Learning Representations (ICLR), 2023
Frederik Kunstner
Jacques Chen
J. Lavington
Mark Schmidt
355
111
0
27 Apr 2023
SGD with AdaGrad Stepsizes: Full Adaptivity with High Probability to
  Unknown Parameters, Unbounded Gradients and Affine Variance
SGD with AdaGrad Stepsizes: Full Adaptivity with High Probability to Unknown Parameters, Unbounded Gradients and Affine VarianceInternational Conference on Machine Learning (ICML), 2023
Amit Attia
Tomer Koren
ODL
311
32
0
17 Feb 2023
Differentially Private Adaptive Optimization with Delayed
  Preconditioners
Differentially Private Adaptive Optimization with Delayed PreconditionersInternational Conference on Learning Representations (ICLR), 2022
Tian Li
Manzil Zaheer
Ziyu Liu
Sashank J. Reddi
H. B. McMahan
Virginia Smith
319
17
0
01 Dec 2022
Communication-Efficient Adam-Type Algorithms for Distributed Data Mining
Communication-Efficient Adam-Type Algorithms for Distributed Data MiningIndustrial Conference on Data Mining (IDM), 2022
Wenhan Xian
Feihu Huang
Heng-Chiao Huang
FedML
173
1
0
14 Oct 2022
Divergence Results and Convergence of a Variance Reduced Version of ADAM
Divergence Results and Convergence of a Variance Reduced Version of ADAM
Ruiqi Wang
Diego Klabjan
247
9
0
11 Oct 2022
Dynamic Regret of Adaptive Gradient Methods for Strongly Convex Problems
Dynamic Regret of Adaptive Gradient Methods for Strongly Convex Problems
Parvin Nazari
E. Khorram
ODL
286
5
0
04 Sep 2022
Optimistic Optimisation of Composite Objective with Exponentiated Update
Optimistic Optimisation of Composite Objective with Exponentiated UpdateMachine-mediated learning (ML), 2022
Weijia Shao
F. Sivrikaya
S. Albayrak
430
4
0
08 Aug 2022
High Probability Bounds for a Class of Nonconvex Algorithms with AdaGrad
  Stepsize
High Probability Bounds for a Class of Nonconvex Algorithms with AdaGrad StepsizeInternational Conference on Learning Representations (ICLR), 2022
Ali Kavis
Kfir Y. Levy
Volkan Cevher
250
49
0
06 Apr 2022
Analysis of Dual-Based PID Controllers through Convolutional Mirror
  Descent
Analysis of Dual-Based PID Controllers through Convolutional Mirror Descent
S. Balseiro
Haihao Lu
Vahab Mirrokni
Balasubramanian Sivan
407
7
0
12 Feb 2022
Maximizing Communication Efficiency for Large-scale Training via 0/1
  Adam
Maximizing Communication Efficiency for Large-scale Training via 0/1 AdamInternational Conference on Learning Representations (ICLR), 2022
Yucheng Lu
Conglong Li
Minjia Zhang
Christopher De Sa
Yuxiong He
OffRLAI4CE
417
22
0
12 Feb 2022
AdaTerm: Adaptive T-Distribution Estimated Robust Moments for
  Noise-Robust Stochastic Gradient Optimization
AdaTerm: Adaptive T-Distribution Estimated Robust Moments for Noise-Robust Stochastic Gradient OptimizationNeurocomputing (Neurocomputing), 2022
Wendyam Eric Lionel Ilboudo
Taisuke Kobayashi
Takamitsu Matsubara
413
18
0
18 Jan 2022
Communication-Compressed Adaptive Gradient Method for Distributed
  Nonconvex Optimization
Communication-Compressed Adaptive Gradient Method for Distributed Nonconvex OptimizationInternational Conference on Artificial Intelligence and Statistics (AISTATS), 2021
Yujia Wang
Lu Lin
Jinghui Chen
391
21
0
01 Nov 2021
Momentum Centering and Asynchronous Update for Adaptive Gradient Methods
Momentum Centering and Asynchronous Update for Adaptive Gradient MethodsNeural Information Processing Systems (NeurIPS), 2021
Juntang Zhuang
Yifan Ding
Tommy M. Tang
Nicha Dvornek
S. Tatikonda
James S. Duncan
ODL
295
7
0
11 Oct 2021
A New Adaptive Gradient Method with Gradient Decomposition
A New Adaptive Gradient Method with Gradient DecompositionMachine-mediated learning (ML), 2021
Zhou Shao
Tong Lin
ODL
154
2
0
18 Jul 2021
AdaL: Adaptive Gradient Transformation Contributes to Convergences and
  Generalizations
AdaL: Adaptive Gradient Transformation Contributes to Convergences and Generalizations
Hongwei Zhang
Weidong Zou
Hongbo Zhao
Qi Ming
Tijin Yan
Yuanqing Xia
Weipeng Cao
ODL
124
0
0
04 Jul 2021
The Role of Momentum Parameters in the Optimal Convergence of Adaptive
  Polyak's Heavy-ball Methods
The Role of Momentum Parameters in the Optimal Convergence of Adaptive Polyak's Heavy-ball MethodsInternational Conference on Learning Representations (ICLR), 2021
Wei Tao
Sheng Long
Gao-wei Wu
Qing Tao
171
17
0
15 Feb 2021
On the Last Iterate Convergence of Momentum Methods
On the Last Iterate Convergence of Momentum MethodsInternational Conference on Algorithmic Learning Theory (ALT), 2021
Xiaoyun Li
Mingrui Liu
Francesco Orabona
435
12
0
13 Feb 2021
1-bit Adam: Communication Efficient Large-Scale Training with Adam's
  Convergence Speed
1-bit Adam: Communication Efficient Large-Scale Training with Adam's Convergence SpeedInternational Conference on Machine Learning (ICML), 2021
Hanlin Tang
Shaoduo Gan
A. A. Awan
Samyam Rajbhandari
Conglong Li
Xiangru Lian
Ji Liu
Ce Zhang
Yuxiong He
AI4CE
335
103
0
04 Feb 2021
Stochastic optimization with momentum: convergence, fluctuations, and
  traps avoidance
Stochastic optimization with momentum: convergence, fluctuations, and traps avoidance
Anas Barakat
Pascal Bianchi
W. Hachem
S. Schechtman
360
16
0
07 Dec 2020
A Modular Analysis of Provable Acceleration via Polyak's Momentum:
  Training a Wide ReLU Network and a Deep Linear Network
A Modular Analysis of Provable Acceleration via Polyak's Momentum: Training a Wide ReLU Network and a Deep Linear NetworkInternational Conference on Machine Learning (ICML), 2020
Jun-Kun Wang
Chi-Heng Lin
Jacob D. Abernethy
726
26
0
04 Oct 2020
Adaptive Gradient Methods Converge Faster with Over-Parameterization
  (but you should do a line-search)
Adaptive Gradient Methods Converge Faster with Over-Parameterization (but you should do a line-search)
Sharan Vaswani
I. Laradji
Frederik Kunstner
S. Meng
Mark Schmidt
Damien Scieur
370
30
0
11 Jun 2020
Convergence of adaptive algorithms for weakly convex constrained
  optimization
Convergence of adaptive algorithms for weakly convex constrained optimization
Ahmet Alacaoglu
Yura Malitsky
Volkan Cevher
240
14
0
11 Jun 2020
On the Convergence of Adaptive Gradient Methods for Nonconvex
  Optimization
On the Convergence of Adaptive Gradient Methods for Nonconvex Optimization
Dongruo Zhou
Yiqi Tang
Yuan Cao
Ziyan Yang
Quanquan Gu
522
161
0
16 Aug 2018
1
Page 1 of 1