A new regret analysis for Adam-type algorithms

International Conference on Machine Learning (ICML), 2020

21 March 2020

Papers citing "A new regret analysis for Adam-type algorithms"

34 / 34 papers shown

Scale-Invariant Regret Matching and Online Learning with Optimal Convergence: Bridging Theory and Practice in Zero-Sum Games

B. Zhang

Ioannis Anagnostides

Tuomas Sandholm

210

06 Oct 2025

How to Set

β_1, β_2

in Adam: An Online Learning Perspective

Quan Nguyen

OffRL

343

03 Oct 2025

In Search of Adam's Secret Sauce

Antonio Orvieto

Robert Gower

451

27 May 2025

Learning Rate Annealing Improves Tuning Robustness in Stochastic Optimization

Amit Attia

Tomer Koren

470

12 Mar 2025

SPAM: Spike-Aware Adam with Momentum Reset for Stable LLM TrainingInternational Conference on Learning Representations (ICLR), 2025

538

12 Jan 2025

Temporal Context Consistency Above All: Enhancing Long-Term Anticipation by Learning and Enforcing Temporal Constraints

Alberto Maté

Mariella Dimiccoli

AI4TS

379

27 Dec 2024

CAdam: Confidence-Based Optimization for Online Learning

...

417

29 Nov 2024

Provable Complexity Improvement of AdaGrad over SGD: Upper and Lower Bounds in Stochastic Non-Convex OptimizationAnnual Conference Computational Learning Theory (COLT), 2024

Devyani Maladkar

Ruichen Jiang

Aryan Mokhtari

524

07 Jun 2024

How Free is Parameter-Free Stochastic Optimization?International Conference on Machine Learning (ICML), 2024

Amit Attia

Tomer Koren

ODL

495

05 Feb 2024

Understanding Adam Optimizer via Online Learning of Updates: Adam is FTRL in Disguise

349

02 Feb 2024

Bidirectional Looking with A Novel Double Exponential Moving Average to Adaptive and Non-adaptive Momentum OptimizersInternational Conference on Machine Learning (ICML), 2023

Bo Du

215

02 Jul 2023

Noise Is Not the Main Factor Behind the Gap Between SGD and Adam on Transformers, but Sign Descent Might BeInternational Conference on Learning Representations (ICLR), 2023

355

111

27 Apr 2023

SGD with AdaGrad Stepsizes: Full Adaptivity with High Probability to Unknown Parameters, Unbounded Gradients and Affine VarianceInternational Conference on Machine Learning (ICML), 2023

Amit Attia

Tomer Koren

ODL

311

17 Feb 2023

Differentially Private Adaptive Optimization with Delayed PreconditionersInternational Conference on Learning Representations (ICLR), 2022

319

01 Dec 2022

Communication-Efficient Adam-Type Algorithms for Distributed Data MiningIndustrial Conference on Data Mining (IDM), 2022

173

14 Oct 2022

Divergence Results and Convergence of a Variance Reduced Version of ADAM

Ruiqi Wang

Diego Klabjan

247

11 Oct 2022

Dynamic Regret of Adaptive Gradient Methods for Strongly Convex Problems

Parvin Nazari

E. Khorram

ODL

286

04 Sep 2022

Optimistic Optimisation of Composite Objective with Exponentiated UpdateMachine-mediated learning (ML), 2022

Weijia Shao

F. Sivrikaya

S. Albayrak

430

08 Aug 2022

High Probability Bounds for a Class of Nonconvex Algorithms with AdaGrad StepsizeInternational Conference on Learning Representations (ICLR), 2022

Ali Kavis

Kfir Y. Levy

Volkan Cevher

250

06 Apr 2022

Analysis of Dual-Based PID Controllers through Convolutional Mirror Descent

S. Balseiro

Haihao Lu

Vahab Mirrokni

Balasubramanian Sivan

407

12 Feb 2022

Maximizing Communication Efficiency for Large-scale Training via 0/1 AdamInternational Conference on Learning Representations (ICLR), 2022

Yuxiong He

417

12 Feb 2022

AdaTerm: Adaptive T-Distribution Estimated Robust Moments for Noise-Robust Stochastic Gradient OptimizationNeurocomputing (Neurocomputing), 2022

Wendyam Eric Lionel Ilboudo

Taisuke Kobayashi

Takamitsu Matsubara

413

18 Jan 2022

Communication-Compressed Adaptive Gradient Method for Distributed Nonconvex OptimizationInternational Conference on Artificial Intelligence and Statistics (AISTATS), 2021

Yujia Wang

Lu Lin

Jinghui Chen

391

01 Nov 2021

Momentum Centering and Asynchronous Update for Adaptive Gradient MethodsNeural Information Processing Systems (NeurIPS), 2021

James S. Duncan

295

11 Oct 2021

A New Adaptive Gradient Method with Gradient DecompositionMachine-mediated learning (ML), 2021

Zhou Shao

Tong Lin

ODL

154

18 Jul 2021

AdaL: Adaptive Gradient Transformation Contributes to Convergences and Generalizations

124

04 Jul 2021

The Role of Momentum Parameters in the Optimal Convergence of Adaptive Polyak's Heavy-ball MethodsInternational Conference on Learning Representations (ICLR), 2021

171

15 Feb 2021

On the Last Iterate Convergence of Momentum MethodsInternational Conference on Algorithmic Learning Theory (ALT), 2021

Xiaoyun Li

Mingrui Liu

Francesco Orabona

435

13 Feb 2021

1-bit Adam: Communication Efficient Large-Scale Training with Adam's Convergence SpeedInternational Conference on Machine Learning (ICML), 2021

Yuxiong He

335

103

04 Feb 2021

Stochastic optimization with momentum: convergence, fluctuations, and traps avoidance

360

07 Dec 2020

A Modular Analysis of Provable Acceleration via Polyak's Momentum: Training a Wide ReLU Network and a Deep Linear NetworkInternational Conference on Machine Learning (ICML), 2020

Jun-Kun Wang

Chi-Heng Lin

Jacob D. Abernethy

726

04 Oct 2020

Adaptive Gradient Methods Converge Faster with Over-Parameterization (but you should do a line-search)

370

11 Jun 2020

Convergence of adaptive algorithms for weakly convex constrained optimization

Ahmet Alacaoglu

Yura Malitsky

Volkan Cevher

240

11 Jun 2020

On the Convergence of Adaptive Gradient Methods for Nonconvex Optimization

Quanquan Gu

522

161

16 Aug 2018