ResearchTrend.AI
  • Papers
  • Communities
  • Organizations
  • Events
  • Blog
  • Pricing
  • Feedback
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1808.05671
  4. Cited By
On the Convergence of Adaptive Gradient Methods for Nonconvex
  Optimization
v1v2v3v4 (latest)

On the Convergence of Adaptive Gradient Methods for Nonconvex Optimization

16 August 2018
Dongruo Zhou
Yiqi Tang
Yuan Cao
Ziyan Yang
Quanquan Gu
ArXiv (abs)PDFHTML

Papers citing "On the Convergence of Adaptive Gradient Methods for Nonconvex Optimization"

50 / 107 papers shown
Title
Theoretical analysis of Adam using hyperparameters close to one without
  Lipschitz smoothness
Theoretical analysis of Adam using hyperparameters close to one without Lipschitz smoothness
Hideaki Iiduka
105
5
0
27 Jun 2022
Stability and Generalization of Stochastic Optimization with Nonconvex
  and Nonsmooth Problems
Stability and Generalization of Stochastic Optimization with Nonconvex and Nonsmooth Problems
Yunwen Lei
98
21
0
14 Jun 2022
On Distributed Adaptive Optimization with Gradient Compression
On Distributed Adaptive Optimization with Gradient Compression
Xiaoyun Li
Belhal Karimi
Ping Li
101
30
0
11 May 2022
Communication-Efficient Adaptive Federated Learning
Communication-Efficient Adaptive Federated Learning
Yujia Wang
Lu Lin
Jinghui Chen
FedML
135
81
0
05 May 2022
High Probability Bounds for a Class of Nonconvex Algorithms with AdaGrad
  Stepsize
High Probability Bounds for a Class of Nonconvex Algorithms with AdaGrad Stepsize
Ali Kavis
Kfir Y. Levy
Volkan Cevher
108
44
0
06 Apr 2022
Maximizing Communication Efficiency for Large-scale Training via 0/1
  Adam
Maximizing Communication Efficiency for Large-scale Training via 0/1 Adam
Yucheng Lu
Conglong Li
Minjia Zhang
Christopher De Sa
Yuxiong He
OffRLAI4CE
155
21
0
12 Feb 2022
The Power of Adaptivity in SGD: Self-Tuning Step Sizes with Unbounded
  Gradients and Affine Variance
The Power of Adaptivity in SGD: Self-Tuning Step Sizes with Unbounded Gradients and Affine Variance
Matthew Faw
Isidoros Tziotis
Constantine Caramanis
Aryan Mokhtari
Sanjay Shakkottai
Rachel A. Ward
141
67
0
11 Feb 2022
Adapting to Mixing Time in Stochastic Optimization with Markovian Data
Adapting to Mixing Time in Stochastic Optimization with Markovian Data
Ron Dorfman
Kfir Y. Levy
156
33
0
09 Feb 2022
A Projection-free Algorithm for Constrained Stochastic Multi-level
  Composition Optimization
A Projection-free Algorithm for Constrained Stochastic Multi-level Composition Optimization
Tesi Xiao
Krishnakumar Balasubramanian
Saeed Ghadimi
88
5
0
09 Feb 2022
Understanding AdamW through Proximal Methods and Scale-Freeness
Understanding AdamW through Proximal Methods and Scale-Freeness
Zhenxun Zhuang
Mingrui Liu
Ashok Cutkosky
Francesco Orabona
133
81
0
31 Jan 2022
Communication-Efficient TeraByte-Scale Model Training Framework for
  Online Advertising
Communication-Efficient TeraByte-Scale Model Training Framework for Online Advertising
Weijie Zhao
Xuewu Jiao
Mingqing Hu
Xiaoyun Li
Xinming Zhang
Ping Li
3DV
101
8
0
05 Jan 2022
Communication-Compressed Adaptive Gradient Method for Distributed
  Nonconvex Optimization
Communication-Compressed Adaptive Gradient Method for Distributed Nonconvex Optimization
Yujia Wang
Lu Lin
Jinghui Chen
169
18
0
01 Nov 2021
A theoretical and empirical study of new adaptive algorithms with
  additional momentum steps and shifted updates for stochastic non-convex
  optimization
A theoretical and empirical study of new adaptive algorithms with additional momentum steps and shifted updates for stochastic non-convex optimization
C. Alecsa
97
0
0
16 Oct 2021
Frequency-aware SGD for Efficient Embedding Learning with Provable
  Benefits
Frequency-aware SGD for Efficient Embedding Learning with Provable Benefits
Yan Li
Dhruv Choudhary
Xiaohan Wei
Baichuan Yuan
Bhargav Bhushanam
T. Zhao
Guanghui Lan
103
6
0
10 Oct 2021
Layer-wise and Dimension-wise Locally Adaptive Federated Learning
Layer-wise and Dimension-wise Locally Adaptive Federated Learning
Belhal Karimi
Ping Li
Xiaoyun Li
FedML
151
3
0
01 Oct 2021
On the One-sided Convergence of Adam-type Algorithms in Non-convex
  Non-concave Min-max Optimization
On the One-sided Convergence of Adam-type Algorithms in Non-convex Non-concave Min-max Optimization
Zehao Dou
Yuanzhi Li
109
13
0
29 Sep 2021
AdaLoss: A computationally-efficient and provably convergent adaptive
  gradient method
AdaLoss: A computationally-efficient and provably convergent adaptive gradient method
Xiaoxia Wu
Yuege Xie
S. Du
Rachel A. Ward
ODL
86
7
0
17 Sep 2021
On the Convergence of Decentralized Adaptive Gradient Methods
On the Convergence of Decentralized Adaptive Gradient Methods
Xiangyi Chen
Belhal Karimi
Weijie Zhao
Ping Li
95
27
0
07 Sep 2021
Accelerating Federated Learning with a Global Biased Optimiser
Accelerating Federated Learning with a Global Biased Optimiser
Jed Mills
Jia Hu
Geyong Min
Rui Jin
Siwei Zheng
Jin Wang
FedMLAI4CE
98
11
0
20 Aug 2021
Adaptivity without Compromise: A Momentumized, Adaptive, Dual Averaged
  Gradient Method for Stochastic Optimization
Adaptivity without Compromise: A Momentumized, Adaptive, Dual Averaged Gradient Method for Stochastic Optimization
Aaron Defazio
Samy Jelassi
ODL
128
72
0
26 Jan 2021
Towards Practical Adam: Non-Convexity, Convergence Theory, and
  Mini-Batch Acceleration
Towards Practical Adam: Non-Convexity, Convergence Theory, and Mini-Batch Acceleration
Congliang Chen
Li Shen
Fangyu Zou
Wei Liu
109
32
0
14 Jan 2021
A Comprehensive Study on Optimization Strategies for Gradient Descent In
  Deep Learning
A Comprehensive Study on Optimization Strategies for Gradient Descent In Deep Learning
K. Yadav
76
1
0
07 Jan 2021
Variance Reduction on General Adaptive Stochastic Mirror Descent
Variance Reduction on General Adaptive Stochastic Mirror Descent
Wenjie Li
Zhanyu Wang
Yichen Zhang
Guang Cheng
122
4
0
26 Dec 2020
Recent Theoretical Advances in Non-Convex Optimization
Recent Theoretical Advances in Non-Convex Optimization
Marina Danilova
Pavel Dvurechensky
Alexander Gasnikov
Eduard A. Gorbunov
Sergey Guminov
Dmitry Kamzolov
Innokentiy Shibaev
177
89
0
11 Dec 2020
Stochastic optimization with momentum: convergence, fluctuations, and
  traps avoidance
Stochastic optimization with momentum: convergence, fluctuations, and traps avoidance
Anas Barakat
Pascal Bianchi
W. Hachem
S. Schechtman
157
14
0
07 Dec 2020
Understanding the Role of Adversarial Regularization in Supervised
  Learning
Understanding the Role of Adversarial Regularization in Supervised Learning
Litu Rout
75
3
0
01 Oct 2020
A Qualitative Study of the Dynamic Behavior for Adaptive Gradient
  Algorithms
A Qualitative Study of the Dynamic Behavior for Adaptive Gradient Algorithms
Chao Ma
Lei Wu
E. Weinan
ODL
84
29
0
14 Sep 2020
Binary Search and First Order Gradient Based Method for Stochastic
  Optimization
Binary Search and First Order Gradient Based Method for Stochastic Optimization
V. Pandey
ODL
61
0
0
27 Jul 2020
Analysis of Q-learning with Adaptation and Momentum Restart for Gradient
  Descent
Analysis of Q-learning with Adaptation and Momentum Restart for Gradient Descent
Chuhan Wu
Fangzhao Wu
Tao Qi
Yongfeng Huang
76
24
0
15 Jul 2020
AdaSGD: Bridging the gap between SGD and Adam
AdaSGD: Bridging the gap between SGD and Adam
Jiaxuan Wang
Jenna Wiens
111
10
0
30 Jun 2020
Robust Federated Recommendation System
Robust Federated Recommendation System
Chen Chen
Jingfeng Zhang
A. Tung
Mohan Kankanhalli
Gang Chen
FedML
113
28
0
15 Jun 2020
Adaptive Gradient Methods Can Be Provably Faster than SGD after Finite
  Epochs
Adaptive Gradient Methods Can Be Provably Faster than SGD after Finite Epochs
Xunpeng Huang
Hao Zhou
Runxin Xu
Zhe Wang
Lei Li
ODL
91
2
0
12 Jun 2020
Adaptive Gradient Methods Converge Faster with Over-Parameterization
  (but you should do a line-search)
Adaptive Gradient Methods Converge Faster with Over-Parameterization (but you should do a line-search)
Sharan Vaswani
I. Laradji
Frederik Kunstner
S. Meng
Mark Schmidt
Simon Lacoste-Julien
196
28
0
11 Jun 2020
Beyond Worst-Case Analysis in Stochastic Approximation: Moment
  Estimation Improves Instance Complexity
Beyond Worst-Case Analysis in Stochastic Approximation: Moment Estimation Improves Instance Complexity
J.N. Zhang
Hongzhou Lin
Subhro Das
S. Sra
Ali Jadbabaie
70
1
0
08 Jun 2020
Momentum-based variance-reduced proximal stochastic gradient method for
  composite nonconvex stochastic optimization
Momentum-based variance-reduced proximal stochastic gradient method for composite nonconvex stochastic optimization
Yangyang Xu
Yibo Xu
154
28
0
31 May 2020
MixML: A Unified Analysis of Weakly Consistent Parallel Learning
MixML: A Unified Analysis of Weakly Consistent Parallel Learning
Yucheng Lu
J. Nash
Christopher De Sa
FedML
99
12
0
14 May 2020
AdaX: Adaptive Gradient Descent with Exponential Long Term Memory
AdaX: Adaptive Gradient Descent with Exponential Long Term Memory
Wenjie Li
Zhaoyang Zhang
Xinjiang Wang
Ping Luo
ODL
124
28
0
21 Apr 2020
A new regret analysis for Adam-type algorithms
A new regret analysis for Adam-type algorithms
Ahmet Alacaoglu
Yura Malitsky
P. Mertikopoulos
Volkan Cevher
ODL
127
43
0
21 Mar 2020
A Simple Convergence Proof of Adam and Adagrad
A Simple Convergence Proof of Adam and Adagrad
Alexandre Défossez
Léon Bottou
Francis R. Bach
Nicolas Usunier
239
171
0
05 Mar 2020
Parallel and distributed asynchronous adaptive stochastic gradient
  methods
Parallel and distributed asynchronous adaptive stochastic gradient methods
Yangyang Xu
Yibo Xu
Yonggui Yan
Colin Sutcher-Shepard
Leopold Grinberg
Jiewei Chen
107
5
0
21 Feb 2020
Non-asymptotic Convergence of Adam-type Reinforcement Learning
  Algorithms under Markovian Sampling
Non-asymptotic Convergence of Adam-type Reinforcement Learning Algorithms under Markovian Sampling
Huaqing Xiong
Tengyu Xu
Yingbin Liang
Wei Zhang
119
33
0
15 Feb 2020
Gradient descent with momentum --- to accelerate or to super-accelerate?
Gradient descent with momentum --- to accelerate or to super-accelerate?
Goran Nakerst
John Brennan
M. Haque
ODL
76
17
0
17 Jan 2020
On the Trend-corrected Variant of Adaptive Stochastic Optimization
  Methods
On the Trend-corrected Variant of Adaptive Stochastic Optimization Methods
Bingxin Zhou
Xuebin Zheng
Junbin Gao
ODL
41
1
0
17 Jan 2020
Revisiting Landscape Analysis in Deep Neural Networks: Eliminating
  Decreasing Paths to Infinity
Revisiting Landscape Analysis in Deep Neural Networks: Eliminating Decreasing Paths to Infinity
Shiyu Liang
Ruoyu Sun
R. Srikant
120
21
0
31 Dec 2019
Optimization for deep learning: theory and algorithms
Optimization for deep learning: theory and algorithms
Ruoyu Sun
ODL
171
171
0
19 Dec 2019
Towards Understanding the Spectral Bias of Deep Learning
Towards Understanding the Spectral Bias of Deep Learning
Yuan Cao
Zhiying Fang
Yue Wu
Ding-Xuan Zhou
Quanquan Gu
203
234
0
03 Dec 2019
Convergence Analysis of a Momentum Algorithm with Adaptive Step Size for
  Non Convex Optimization
Convergence Analysis of a Momentum Algorithm with Adaptive Step Size for Non Convex Optimization
Anas Barakat
Pascal Bianchi
118
12
0
18 Nov 2019
On Higher-order Moments in Adam
On Higher-order Moments in Adam
Zhanhong Jiang
Aditya Balu
Sin Yong Tan
Young M. Lee
Chinmay Hegde
Soumik Sarkar
ODL
35
3
0
15 Oct 2019
Calibrating the Adaptive Learning Rate to Improve Convergence of ADAM
Calibrating the Adaptive Learning Rate to Improve Convergence of ADAM
Qianqian Tong
Guannan Liang
J. Bi
139
7
0
02 Aug 2019
Why gradient clipping accelerates training: A theoretical justification
  for adaptivity
Why gradient clipping accelerates training: A theoretical justification for adaptivity
J.N. Zhang
Tianxing He
S. Sra
Ali Jadbabaie
194
501
0
28 May 2019
Previous
123
Next