v1v2v3v4 (latest)

On the Convergence of Adaptive Gradient Methods for Nonconvex Optimization

16 August 2018

Quanquan Gu

Papers citing "On the Convergence of Adaptive Gradient Methods for Nonconvex Optimization"

50 / 107 papers shown

Title
On the Convergence of Muon and Beyond Da Chang Yongxiang Liu Ganzhao Yuan 11 0 0 19 Sep 2025
Adaptive Preconditioners Trigger Loss Spikes in Adam Zhiwei Bai Zhangchen Zhou Jiajie Zhao Xiaolong Li Zhiyu Li Feiyu Xiong Hongkang Yang Yaoyu Zhang Z. Xu ODL 152 0 0 05 Jun 2025
Unified Scaling Laws for Compressed Representations Andrei Panferov Alexandra Volkova Ionut-Vlad Modoranu Vage Egiazarian M. Safaryan Dan Alistarh 109 0 0 02 Jun 2025
LightSAM: Parameter-Agnostic Sharpness-Aware Minimization Yifei Cheng Li Shen Hao Sun Nan Yin Xiaochun Cao Enhong Chen AAML 90 0 0 30 May 2025
Temporal Context Consistency Above All: Enhancing Long-Term Anticipation by Learning and Enforcing Temporal Constraints Alberto Maté Mariella Dimiccoli AI4TS 121 0 0 27 Dec 2024
Attribute Inference Attacks for Federated Regression Tasks Francesco Diana Othmane Marfoq Chuan Xu Giovanni Neglia F. Giroire Eoin Thomas AAML 742 1 0 19 Nov 2024
Understanding Adam Requires Better Rotation Dependent Assumptions Lucas Maes Tianyue H. Zhang Alexia Jolicoeur-Martineau Ioannis Mitliagkas Damien Scieur Simon Lacoste-Julien Charles Guille-Escuret 102 3 0 25 Oct 2024
LDAdam: Adaptive Optimization from Low-Dimensional Gradient Statistics Thomas Robert M. Safaryan Ionut-Vlad Modoranu Dan Alistarh ODL 185 12 0 21 Oct 2024
Attack Anything: Blind DNNs via Universal Background Adversarial Attack Jiawei Lian Shaohui Mei X. Wang Yi Wang L. Wang Yingjie Lu Mingyang Ma Lap-Pui Chau AAML 161 2 0 17 Aug 2024
The Implicit Bias of Adam on Separable Data Chenyang Zhang Difan Zou Yuan Cao AI4CE 138 10 0 15 Jun 2024
Provable Complexity Improvement of AdaGrad over SGD: Upper and Lower Bounds in Stochastic Non-Convex Optimization Devyani Maladkar Ruichen Jiang Aryan Mokhtari 178 3 0 07 Jun 2024
Achieving Near-Optimal Convergence for Distributed Minimax Optimization with Adaptive Stepsizes Yan Huang Xiang Li Yipeng Shen Niao He Jinming Xu 128 2 0 05 Jun 2024
MicroAdam: Accurate Adaptive Optimization with Low Space Overhead and Provable Convergence Ionut-Vlad Modoranu M. Safaryan Grigory Malinovsky Eldar Kurtic Thomas Robert Peter Richtárik Dan Alistarh MQ 115 18 0 24 May 2024
Conjugate-Gradient-like Based Adaptive Moment Estimation Optimization Algorithm for Deep Learning Jiawu Tian Liwei Xu Xiaowei Zhang Yongqi Li ODL 207 0 0 02 Apr 2024
Regularized DeepIV with Model Selection Zihao Li Hui Lan Vasilis Syrgkanis Mengdi Wang Masatoshi Uehara 146 2 0 07 Mar 2024
Why Transformers Need Adam: A Hessian Perspective Yushun Zhang Congliang Chen Tian Ding Ziniu Li Ruoyu Sun Zhimin Luo 185 61 0 26 Feb 2024
Revisiting Convergence of AdaGrad with Relaxed Assumptions Yusu Hong Junhong Lin 94 13 0 21 Feb 2024
AdAdaGrad: Adaptive Batch Size Schemes for Adaptive Gradient Methods Tim Tsz-Kit Lau Han Liu Mladen Kolar ODL 130 8 0 17 Feb 2024
Towards Quantifying the Preconditioning Effect of Adam Rudrajit Das Naman Agarwal Sujay Sanghavi Inderjit S. Dhillon 53 7 0 11 Feb 2024
On Convergence of Adam for Stochastic Optimization under Relaxed Assumptions Yusu Hong Junhong Lin 195 15 0 06 Feb 2024
Momentum Does Not Reduce Stochastic Noise in Stochastic Gradient Descent Naoki Sato Hideaki Iiduka ODL 130 1 0 04 Feb 2024
Probabilistic Guarantees of Stochastic Recursive Gradient in Non-Convex Finite Sum Problems Yanjie Zhong Jiaqi Li Soumendra Lahiri 83 1 0 29 Jan 2024
AGD: an Auto-switchable Optimizer using Stepwise Gradient Difference for Preconditioning Matrix Yun Yue Zhiling Ye Jiadi Jiang Yongchao Liu Ke Zhang ODL 136 1 0 04 Dec 2023
Using Stochastic Gradient Descent to Smooth Nonconvex Functions: Analysis of Implicit Graduated Optimization with Optimal Noise Scheduling Naoki Sato Hideaki Iiduka 134 4 0 15 Nov 2023
High Probability Convergence of Adam Under Unbounded Gradients and Affine Variance Noise Yusu Hong Junhong Lin 97 11 0 03 Nov 2023
Demystifying the Myths and Legends of Nonconvex Convergence of SGD Aritra Dutta El Houcine Bergou Soumia Boucherouite Nicklas Werge M. Kandemir Xin Li 89 0 0 19 Oct 2023
FedLALR: Client-Specific Adaptive Learning Rates Achieve Linear Speedup for Non-IID Data Hao Sun Li Shen Shi-Yong Chen Jingwei Sun Jing Li Guangzhong Sun Dacheng Tao FedML 111 2 0 18 Sep 2023
DRAG: Divergence-based Adaptive Aggregation in Federated learning on Non-IID Data Feng Zhu Jingjing Zhang Shengyun Liu Xin Eric Wang FedML 108 1 0 04 Sep 2023
Efficient Federated Learning via Local Adaptive Amended Optimizer with Linear Speedup Yan Sun Li Shen Hao Sun Liang Ding Dacheng Tao FedML 108 18 0 30 Jul 2023
High Probability Analysis for Non-Convex Stochastic Optimization with Clipping Shaojie Li Yong Liu 118 4 0 25 Jul 2023
Toward Understanding Why Adam Converges Faster Than SGD for Transformers Yan Pan Yuanzhi Li 172 48 0 31 May 2023
Two Sides of One Coin: the Limits of Untuned SGD and the Power of Adaptive Methods Junchi Yang Xiang Li Ilyas Fatkhullin Niao He 110 20 0 21 May 2023
Towards Understanding the Generalization of Graph Neural Networks Huayi Tang Y. Liu GNN AI4CE 124 44 0 14 May 2023
UAdam: Unified Adam-Type Algorithmic Framework for Non-Convex Stochastic Optimization Yiming Jiang Jinlan Liu Dongpo Xu Danilo Mandic 63 4 0 09 May 2023
Convergence of Adam Under Relaxed Assumptions Haochuan Li Alexander Rakhlin Ali Jadbabaie 187 82 0 27 Apr 2023
AdaSAM: Boosting Sharpness-Aware Minimization with Adaptive Learning Rate and Momentum for Training Deep Neural Networks Hao Sun Li Shen Qihuang Zhong Liang Ding Shi-Yong Chen Jingwei Sun Jing Li Guangzhong Sun Dacheng Tao 118 38 0 01 Mar 2023
SGD with AdaGrad Stepsizes: Full Adaptivity with High Probability to Unknown Parameters, Unbounded Gradients and Affine Variance Amit Attia Tomer Koren ODL 142 29 0 17 Feb 2023
Multilevel Objective-Function-Free Optimization with an Application to Neural Networks Training Serge Gratton Alena Kopanicáková P. Toint 94 9 0 14 Feb 2023
FedDA: Faster Framework of Local Adaptive Gradient Methods via Restarted Dual Averaging Junyi Li Feihu Huang Heng-Chiao Huang FedML 116 1 0 13 Feb 2023
Analysis of Error Feedback in Federated Non-Convex Optimization with Biased Compression Xiaoyun Li Ping Li FedML 108 6 0 25 Nov 2022
On the Algorithmic Stability and Generalization of Adaptive Optimization Methods Han Nguyen Hai Pham Sashank J. Reddi Barnabas Poczos ODL AI4CE 129 2 0 08 Nov 2022
TiAda: A Time-scale Adaptive Algorithm for Nonconvex Minimax Optimization Xiang Li Junchi Yang Niao He 99 10 0 31 Oct 2022
Local Model Reconstruction Attacks in Federated Learning and their Uses Ilias Driouich Chuan Xu Giovanni Neglia F. Giroire Eoin Thomas AAML FedML 131 3 0 28 Oct 2022
Communication-Efficient Adam-Type Algorithms for Distributed Data Mining Wenhan Xian Feihu Huang Heng-Chiao Huang FedML 92 1 0 14 Oct 2022
Dissecting adaptive methods in GANs Samy Jelassi David Dobre A. Mensch Yuanzhi Li Gauthier Gidel 76 4 0 09 Oct 2022
Provable Adaptivity of Adam under Non-uniform Smoothness Bohan Wang Yushun Zhang Huishuai Zhang Qi Meng Ruoyu Sun Zhirui Ma Tie-Yan Liu Zhimin Luo Wei Chen 117 28 0 21 Aug 2022
Critical Bach Size Minimizes Stochastic First-Order Oracle Complexity of Deep Learning Optimizer using Hyperparameters Close to One Hideaki Iiduka ODL 67 5 0 21 Aug 2022
Adam Can Converge Without Any Modification On Update Rules Yushun Zhang Congliang Chen Naichen Shi Ruoyu Sun Zhimin Luo 175 77 0 20 Aug 2022
Adan: Adaptive Nesterov Momentum Algorithm for Faster Optimizing Deep Models Xingyu Xie Pan Zhou Huan Li Zhouchen Lin Shuicheng Yan ODL 212 194 0 13 Aug 2022
Adaptive Gradient Methods at the Edge of Stability Jeremy M. Cohen Behrooz Ghorbani Shankar Krishnan Naman Agarwal Sourabh Medapati ... Daniel Suo David E. Cardoze Zachary Nado George E. Dahl Justin Gilmer ODL 182 57 0 29 Jul 2022