ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1908.10525
  4. Cited By
Linear Convergence of Adaptive Stochastic Gradient Descent
v1v2 (latest)

Linear Convergence of Adaptive Stochastic Gradient Descent

International Conference on Artificial Intelligence and Statistics (AISTATS), 2019
28 August 2019
Yuege Xie
Xiaoxia Wu
Rachel A. Ward
ArXiv (abs)PDFHTML

Papers citing "Linear Convergence of Adaptive Stochastic Gradient Descent"

33 / 33 papers shown
A regret minimization approach to fixed-point iterations
A regret minimization approach to fixed-point iterations
Joon Kwon
170
0
0
25 Sep 2025
On the Convergence of Muon and Beyond
On the Convergence of Muon and Beyond
Da Chang
Yongxiang Liu
Ganzhao Yuan
425
7
0
19 Sep 2025
Adaptive Preconditioners Trigger Loss Spikes in Adam
Adaptive Preconditioners Trigger Loss Spikes in Adam
Zhiwei Bai
Zhangchen Zhou
Jiajie Zhao
Xiaolong Li
Zhiyu Li
Feiyu Xiong
Hongkang Yang
Yaoyu Zhang
Z. Xu
ODL
385
3
0
05 Jun 2025
ASGO: Adaptive Structured Gradient Optimization
ASGO: Adaptive Structured Gradient Optimization
Kang An
Yuxing Liu
Boyao Wang
Shiqian Ma
Shiqian Ma
Tong Zhang
Tong Zhang
ODL
542
37
0
26 Mar 2025
Beyond adaptive gradient: Fast-Controlled Minibatch Algorithm for
  large-scale optimization
Beyond adaptive gradient: Fast-Controlled Minibatch Algorithm for large-scale optimization
Corrado Coppola
Lorenzo Papa
Irene Amerini
L. Palagi
ODL
495
0
0
24 Nov 2024
The High Line: Exact Risk and Learning Rate Curves of Stochastic
  Adaptive Learning Rate Algorithms
The High Line: Exact Risk and Learning Rate Curves of Stochastic Adaptive Learning Rate Algorithms
Elizabeth Collins-Woodfin
Inbar Seroussi
Begona García Malaxechebarría
Andrew W. Mackenzie
Elliot Paquette
Courtney Paquette
205
2
0
30 May 2024
Remove that Square Root: A New Efficient Scale-Invariant Version of AdaGrad
Remove that Square Root: A New Efficient Scale-Invariant Version of AdaGrad
Sayantan Choudhury
N. Tupitsa
Nicolas Loizou
Samuel Horváth
Martin Takáč
Eduard A. Gorbunov
468
7
0
05 Mar 2024
Adaptive SGD with Polyak stepsize and Line-search: Robust Convergence
  and Variance Reduction
Adaptive SGD with Polyak stepsize and Line-search: Robust Convergence and Variance ReductionNeural Information Processing Systems (NeurIPS), 2023
Xiao-Yan Jiang
Sebastian U. Stich
315
33
0
11 Aug 2023
Convergence of Adam for Non-convex Objectives: Relaxed Hyperparameters
  and Non-ergodic Case
Convergence of Adam for Non-convex Objectives: Relaxed Hyperparameters and Non-ergodic CaseMachine-mediated learning (ML), 2023
Meixuan He
Yuqing Liang
Jinlan Liu
Dongpo Xu
299
18
0
20 Jul 2023
Relaxing the Additivity Constraints in Decentralized No-Regret
  High-Dimensional Bayesian Optimization
Relaxing the Additivity Constraints in Decentralized No-Regret High-Dimensional Bayesian OptimizationInternational Conference on Learning Representations (ICLR), 2023
Anthony Bardou
Patrick Thiran
Thomas Begin
422
10
0
31 May 2023
Statistical Analysis of Fixed Mini-Batch Gradient Descent Estimator
Statistical Analysis of Fixed Mini-Batch Gradient Descent EstimatorJournal of Computational And Graphical Statistics (JCGS), 2023
Haobo Qi
Feifei Wang
Hansheng Wang
272
17
0
13 Apr 2023
TiAda: A Time-scale Adaptive Algorithm for Nonconvex Minimax Optimization
TiAda: A Time-scale Adaptive Algorithm for Nonconvex Minimax OptimizationInternational Conference on Learning Representations (ICLR), 2022
Xiang Li
Junchi Yang
Niao He
261
13
0
31 Oct 2022
On the Convergence of AdaGrad(Norm) on $\R^{d}$: Beyond Convexity,
  Non-Asymptotic Rate and Acceleration
On the Convergence of AdaGrad(Norm) on Rd\R^{d}Rd: Beyond Convexity, Non-Asymptotic Rate and Acceleration
Zijian Liu
Ta Duy Nguyen
Alina Ene
Huy Le Nguyen
433
13
0
29 Sep 2022
Accelerating SGD for Highly Ill-Conditioned Huge-Scale Online Matrix
  Completion
Accelerating SGD for Highly Ill-Conditioned Huge-Scale Online Matrix CompletionNeural Information Processing Systems (NeurIPS), 2022
G. Zhang
Hong-Ming Chiu
Richard Y. Zhang
361
12
0
24 Aug 2022
Improved Policy Optimization for Online Imitation Learning
Improved Policy Optimization for Online Imitation Learning
J. Lavington
Sharan Vaswani
Mark Schmidt
OffRL
322
7
0
29 Jul 2022
Adaptive Gradient Methods at the Edge of Stability
Adaptive Gradient Methods at the Edge of Stability
Jeremy M. Cohen
Behrooz Ghorbani
Shankar Krishnan
Naman Agarwal
Sourabh Medapati
...
Daniel Suo
David E. Cardoze
Zachary Nado
George E. Dahl
Justin Gilmer
ODL
309
74
0
29 Jul 2022
Nest Your Adaptive Algorithm for Parameter-Agnostic Nonconvex Minimax
  Optimization
Nest Your Adaptive Algorithm for Parameter-Agnostic Nonconvex Minimax OptimizationNeural Information Processing Systems (NeurIPS), 2022
Junchi Yang
Xiang Li
Niao He
ODL
304
26
0
01 Jun 2022
Optimal Algorithms for Stochastic Multi-Level Compositional Optimization
Optimal Algorithms for Stochastic Multi-Level Compositional OptimizationInternational Conference on Machine Learning (ICML), 2022
Wei Jiang
Bokun Wang
Yibo Wang
Lijun Zhang
Tianbao Yang
530
23
0
15 Feb 2022
Local Quadratic Convergence of Stochastic Gradient Descent with Adaptive
  Step Size
Local Quadratic Convergence of Stochastic Gradient Descent with Adaptive Step Size
Adityanarayanan Radhakrishnan
M. Belkin
Caroline Uhler
ODL
122
0
0
30 Dec 2021
Stationary Behavior of Constant Stepsize SGD Type Algorithms: An
  Asymptotic Characterization
Stationary Behavior of Constant Stepsize SGD Type Algorithms: An Asymptotic CharacterizationProceedings of the ACM on Measurement and Analysis of Computing Systems (POMACS), 2021
Zaiwei Chen
Shancong Mou
S. T. Maguluri
162
16
0
11 Nov 2021
AdaLoss: A computationally-efficient and provably convergent adaptive
  gradient method
AdaLoss: A computationally-efficient and provably convergent adaptive gradient method
Xiaoxia Wu
Yuege Xie
S. Du
Rachel A. Ward
ODL
175
7
0
17 Sep 2021
On Faster Convergence of Scaled Sign Gradient Descent
On Faster Convergence of Scaled Sign Gradient Descent
Xiuxian Li
Kuo-Yi Lin
Li Li
Yiguang Hong
Jie-bin Chen
ODL
215
21
0
04 Sep 2021
Stochastic gradient descent with noise of machine learning type. Part I:
  Discrete time analysis
Stochastic gradient descent with noise of machine learning type. Part I: Discrete time analysisJournal of nonlinear science (J. Nonlinear Sci.), 2021
Stephan Wojtowytsch
298
58
0
04 May 2021
Neurons learn slower than they think
Neurons learn slower than they think
I. Kulikovskikh
189
0
0
02 Apr 2021
Gradient Descent on Neural Networks Typically Occurs at the Edge of
  Stability
Gradient Descent on Neural Networks Typically Occurs at the Edge of StabilityInternational Conference on Learning Representations (ICLR), 2021
Jeremy M. Cohen
Simran Kaur
Yuanzhi Li
J. Zico Kolter
Ameet Talwalkar
ODL
540
373
0
26 Feb 2021
Convergence of stochastic gradient descent schemes for
  Lojasiewicz-landscapes
Convergence of stochastic gradient descent schemes for Lojasiewicz-landscapesJournal of Machine Learning (JML), 2021
Steffen Dereich
Sebastian Kassing
402
35
0
16 Feb 2021
Painless step size adaptation for SGD
Painless step size adaptation for SGD
I. Kulikovskikh
Tarzan Legović
205
0
0
01 Feb 2021
Sequential convergence of AdaGrad algorithm for smooth convex
  optimization
Sequential convergence of AdaGrad algorithm for smooth convex optimizationOperations Research Letters (ORL), 2020
Cheik Traoré
Edouard Pauwels
247
30
0
24 Nov 2020
Linear Convergence of Generalized Mirror Descent with Time-Dependent
  Mirrors
Linear Convergence of Generalized Mirror Descent with Time-Dependent Mirrors
Adityanarayanan Radhakrishnan
M. Belkin
Caroline Uhler
238
10
0
18 Sep 2020
A Qualitative Study of the Dynamic Behavior for Adaptive Gradient
  Algorithms
A Qualitative Study of the Dynamic Behavior for Adaptive Gradient AlgorithmsMathematical and Scientific Machine Learning (MSML), 2020
Chao Ma
Lei Wu
E. Weinan
ODL
190
37
0
14 Sep 2020
Adaptive Gradient Methods Converge Faster with Over-Parameterization
  (but you should do a line-search)
Adaptive Gradient Methods Converge Faster with Over-Parameterization (but you should do a line-search)
Sharan Vaswani
I. Laradji
Frederik Kunstner
S. Meng
Mark Schmidt
Damien Scieur
370
30
0
11 Jun 2020
Choosing the Sample with Lowest Loss makes SGD Robust
Choosing the Sample with Lowest Loss makes SGD RobustInternational Conference on Artificial Intelligence and Statistics (AISTATS), 2020
Vatsal Shah
Xiaoxia Wu
Sujay Sanghavi
386
50
0
10 Jan 2020
Convergence Analysis of a Momentum Algorithm with Adaptive Step Size for
  Non Convex Optimization
Convergence Analysis of a Momentum Algorithm with Adaptive Step Size for Non Convex Optimization
Anas Barakat
Pascal Bianchi
216
13
0
18 Nov 2019
1
Page 1 of 1