v1v2v3v4 (latest)

On the Convergence of Adaptive Gradient Methods for Nonconvex Optimization

16 August 2018

Quanquan Gu

Papers citing "On the Convergence of Adaptive Gradient Methods for Nonconvex Optimization"

50 / 107 papers shown

Title
Theoretical analysis of Adam using hyperparameters close to one without Lipschitz smoothness Hideaki Iiduka 105 5 0 27 Jun 2022
Stability and Generalization of Stochastic Optimization with Nonconvex and Nonsmooth Problems Yunwen Lei 98 21 0 14 Jun 2022
On Distributed Adaptive Optimization with Gradient Compression Xiaoyun Li Belhal Karimi Ping Li 101 30 0 11 May 2022
Communication-Efficient Adaptive Federated Learning Yujia Wang Lu Lin Jinghui Chen FedML 135 81 0 05 May 2022
High Probability Bounds for a Class of Nonconvex Algorithms with AdaGrad Stepsize Ali Kavis Kfir Y. Levy Volkan Cevher 108 44 0 06 Apr 2022
Maximizing Communication Efficiency for Large-scale Training via 0/1 Adam Yucheng Lu Conglong Li Minjia Zhang Christopher De Sa Yuxiong He OffRL AI4CE 155 21 0 12 Feb 2022
The Power of Adaptivity in SGD: Self-Tuning Step Sizes with Unbounded Gradients and Affine Variance Matthew Faw Isidoros Tziotis Constantine Caramanis Aryan Mokhtari Sanjay Shakkottai Rachel A. Ward 141 67 0 11 Feb 2022
Adapting to Mixing Time in Stochastic Optimization with Markovian Data Ron Dorfman Kfir Y. Levy 156 33 0 09 Feb 2022
A Projection-free Algorithm for Constrained Stochastic Multi-level Composition Optimization Tesi Xiao Krishnakumar Balasubramanian Saeed Ghadimi 88 5 0 09 Feb 2022
Understanding AdamW through Proximal Methods and Scale-Freeness Zhenxun Zhuang Mingrui Liu Ashok Cutkosky Francesco Orabona 133 81 0 31 Jan 2022
Communication-Efficient TeraByte-Scale Model Training Framework for Online Advertising Weijie Zhao Xuewu Jiao Mingqing Hu Xiaoyun Li Xinming Zhang Ping Li 3DV 101 8 0 05 Jan 2022
Communication-Compressed Adaptive Gradient Method for Distributed Nonconvex Optimization Yujia Wang Lu Lin Jinghui Chen 169 18 0 01 Nov 2021
A theoretical and empirical study of new adaptive algorithms with additional momentum steps and shifted updates for stochastic non-convex optimization C. Alecsa 97 0 0 16 Oct 2021
Frequency-aware SGD for Efficient Embedding Learning with Provable Benefits Yan Li Dhruv Choudhary Xiaohan Wei Baichuan Yuan Bhargav Bhushanam T. Zhao Guanghui Lan 103 6 0 10 Oct 2021
Layer-wise and Dimension-wise Locally Adaptive Federated Learning Belhal Karimi Ping Li Xiaoyun Li FedML 151 3 0 01 Oct 2021
On the One-sided Convergence of Adam-type Algorithms in Non-convex Non-concave Min-max Optimization Zehao Dou Yuanzhi Li 109 13 0 29 Sep 2021
AdaLoss: A computationally-efficient and provably convergent adaptive gradient method Xiaoxia Wu Yuege Xie S. Du Rachel A. Ward ODL 86 7 0 17 Sep 2021
On the Convergence of Decentralized Adaptive Gradient Methods Xiangyi Chen Belhal Karimi Weijie Zhao Ping Li 95 27 0 07 Sep 2021
Accelerating Federated Learning with a Global Biased Optimiser Jed Mills Jia Hu Geyong Min Rui Jin Siwei Zheng Jin Wang FedML AI4CE 98 11 0 20 Aug 2021
Adaptivity without Compromise: A Momentumized, Adaptive, Dual Averaged Gradient Method for Stochastic Optimization Aaron Defazio Samy Jelassi ODL 128 72 0 26 Jan 2021
Towards Practical Adam: Non-Convexity, Convergence Theory, and Mini-Batch Acceleration Congliang Chen Li Shen Fangyu Zou Wei Liu 109 32 0 14 Jan 2021
A Comprehensive Study on Optimization Strategies for Gradient Descent In Deep Learning K. Yadav 76 1 0 07 Jan 2021
Variance Reduction on General Adaptive Stochastic Mirror Descent Wenjie Li Zhanyu Wang Yichen Zhang Guang Cheng 122 4 0 26 Dec 2020
Recent Theoretical Advances in Non-Convex Optimization Marina Danilova Pavel Dvurechensky Alexander Gasnikov Eduard A. Gorbunov Sergey Guminov Dmitry Kamzolov Innokentiy Shibaev 177 89 0 11 Dec 2020
Stochastic optimization with momentum: convergence, fluctuations, and traps avoidance Anas Barakat Pascal Bianchi W. Hachem S. Schechtman 157 14 0 07 Dec 2020
Understanding the Role of Adversarial Regularization in Supervised Learning Litu Rout 75 3 0 01 Oct 2020
A Qualitative Study of the Dynamic Behavior for Adaptive Gradient Algorithms Chao Ma Lei Wu E. Weinan ODL 84 29 0 14 Sep 2020
Binary Search and First Order Gradient Based Method for Stochastic Optimization V. Pandey ODL 61 0 0 27 Jul 2020
Analysis of Q-learning with Adaptation and Momentum Restart for Gradient Descent Chuhan Wu Fangzhao Wu Tao Qi Yongfeng Huang 76 24 0 15 Jul 2020
AdaSGD: Bridging the gap between SGD and Adam Jiaxuan Wang Jenna Wiens 111 10 0 30 Jun 2020
Robust Federated Recommendation System Chen Chen Jingfeng Zhang A. Tung Mohan Kankanhalli Gang Chen FedML 113 28 0 15 Jun 2020
Adaptive Gradient Methods Can Be Provably Faster than SGD after Finite Epochs Xunpeng Huang Hao Zhou Runxin Xu Zhe Wang Lei Li ODL 91 2 0 12 Jun 2020
Adaptive Gradient Methods Converge Faster with Over-Parameterization (but you should do a line-search) Sharan Vaswani I. Laradji Frederik Kunstner S. Meng Mark Schmidt Simon Lacoste-Julien 196 28 0 11 Jun 2020
Beyond Worst-Case Analysis in Stochastic Approximation: Moment Estimation Improves Instance Complexity J.N. Zhang Hongzhou Lin Subhro Das S. Sra Ali Jadbabaie 70 1 0 08 Jun 2020
Momentum-based variance-reduced proximal stochastic gradient method for composite nonconvex stochastic optimization Yangyang Xu Yibo Xu 154 28 0 31 May 2020
MixML: A Unified Analysis of Weakly Consistent Parallel Learning Yucheng Lu J. Nash Christopher De Sa FedML 99 12 0 14 May 2020
AdaX: Adaptive Gradient Descent with Exponential Long Term Memory Wenjie Li Zhaoyang Zhang Xinjiang Wang Ping Luo ODL 124 28 0 21 Apr 2020
A new regret analysis for Adam-type algorithms Ahmet Alacaoglu Yura Malitsky P. Mertikopoulos Volkan Cevher ODL 127 43 0 21 Mar 2020
A Simple Convergence Proof of Adam and Adagrad Alexandre Défossez Léon Bottou Francis R. Bach Nicolas Usunier 239 171 0 05 Mar 2020
Parallel and distributed asynchronous adaptive stochastic gradient methods Yangyang Xu Yibo Xu Yonggui Yan Colin Sutcher-Shepard Leopold Grinberg Jiewei Chen 107 5 0 21 Feb 2020
Non-asymptotic Convergence of Adam-type Reinforcement Learning Algorithms under Markovian Sampling Huaqing Xiong Tengyu Xu Yingbin Liang Wei Zhang 119 33 0 15 Feb 2020
Gradient descent with momentum --- to accelerate or to super-accelerate? Goran Nakerst John Brennan M. Haque ODL 76 17 0 17 Jan 2020
On the Trend-corrected Variant of Adaptive Stochastic Optimization Methods Bingxin Zhou Xuebin Zheng Junbin Gao ODL 41 1 0 17 Jan 2020
Revisiting Landscape Analysis in Deep Neural Networks: Eliminating Decreasing Paths to Infinity Shiyu Liang Ruoyu Sun R. Srikant 120 21 0 31 Dec 2019
Optimization for deep learning: theory and algorithms Ruoyu Sun ODL 171 171 0 19 Dec 2019
Towards Understanding the Spectral Bias of Deep Learning Yuan Cao Zhiying Fang Yue Wu Ding-Xuan Zhou Quanquan Gu 203 234 0 03 Dec 2019
Convergence Analysis of a Momentum Algorithm with Adaptive Step Size for Non Convex Optimization Anas Barakat Pascal Bianchi 118 12 0 18 Nov 2019
On Higher-order Moments in Adam Zhanhong Jiang Aditya Balu Sin Yong Tan Young M. Lee Chinmay Hegde Soumik Sarkar ODL 35 3 0 15 Oct 2019
Calibrating the Adaptive Learning Rate to Improve Convergence of ADAM Qianqian Tong Guannan Liang J. Bi 139 7 0 02 Aug 2019
Why gradient clipping accelerates training: A theoretical justification for adaptivity J.N. Zhang Tianxing He S. Sra Ali Jadbabaie 194 501 0 28 May 2019