ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2403.04081
  4. Cited By
Directional Smoothness and Gradient Methods: Convergence and Adaptivity
v1v2 (latest)

Directional Smoothness and Gradient Methods: Convergence and Adaptivity

6 March 2024
Aaron Mishkin
Ahmed Khaled
Yuanhao Wang
Aaron Defazio
Robert Mansel Gower
ArXiv (abs)PDFHTML

Papers citing "Directional Smoothness and Gradient Methods: Convergence and Adaptivity"

30 / 30 papers shown
Title
Sparse Polyak: an adaptive step size rule for high-dimensional M-estimation
Sparse Polyak: an adaptive step size rule for high-dimensional M-estimation
Tianqi Qiao
Marie Maros
93
2
0
11 Sep 2025
Glocal Smoothness: Line Search can really help!
Glocal Smoothness: Line Search can really help!
Curtis Fox
Aaron Mishkin
Sharan Vaswani
Mark Schmidt
154
4
0
14 Jun 2025
New Perspectives on the Polyak Stepsize: Surrogate Functions and Negative Results
New Perspectives on the Polyak Stepsize: Surrogate Functions and Negative Results
Francesco Orabona
Ryan DÓrazio
223
5
0
26 May 2025
Nested Stochastic Algorithm for Generalized Sinkhorn distance-Regularized Distributionally Robust Optimization
Nested Stochastic Algorithm for Generalized Sinkhorn distance-Regularized Distributionally Robust Optimization
Yue Yang
Yi Zhou
Zhaosong Lu
253
0
0
29 Mar 2025
Gradient Descent on Logistic Regression with Non-Separable Data and
  Large Step Sizes
Gradient Descent on Logistic Regression with Non-Separable Data and Large Step Sizes
Si Yi Meng
Antonio Orvieto
Daniel Yiming Cao
Christopher De Sa
191
6
0
07 Jun 2024
SPAM: Stochastic Proximal Point Method with Momentum Variance Reduction
  for Non-convex Cross-Device Federated Learning
SPAM: Stochastic Proximal Point Method with Momentum Variance Reduction for Non-convex Cross-Device Federated Learning
Avetik G. Karagulyan
Egor Shulgin
Abdurakhmon Sadiev
Peter Richtárik
FedML
221
4
0
30 May 2024
Faster Convergence of Stochastic Accelerated Gradient Descent under Interpolation
Faster Convergence of Stochastic Accelerated Gradient Descent under Interpolation
Aaron Mishkin
Mert Pilanci
Mark Schmidt
270
2
0
03 Apr 2024
Non-Uniform Smoothness for Gradient Descent
Non-Uniform Smoothness for Gradient Descent
A. Berahas
Lindon Roberts
Fred Roosta
143
5
0
15 Nov 2023
Normalized Gradients for All
Normalized Gradients for All
Francesco Orabona
190
18
0
10 Aug 2023
Convex and Non-convex Optimization Under Generalized Smoothness
Convex and Non-convex Optimization Under Generalized SmoothnessNeural Information Processing Systems (NeurIPS), 2023
Haochuan Li
Jian Qian
Yi Tian
Alexander Rakhlin
Ali Jadbabaie
319
59
0
02 Jun 2023
Toward Understanding Why Adam Converges Faster Than SGD for Transformers
Toward Understanding Why Adam Converges Faster Than SGD for Transformers
Yan Pan
Yuanzhi Li
204
52
0
31 May 2023
Adaptive Gradient Methods at the Edge of Stability
Adaptive Gradient Methods at the Edge of Stability
Jeremy M. Cohen
Behrooz Ghorbani
Shankar Krishnan
Naman Agarwal
Sourabh Medapati
...
Daniel Suo
David E. Cardoze
Zachary Nado
George E. Dahl
Justin Gilmer
ODL
223
63
0
29 Jul 2022
Accelerated first-order methods for convex optimization with locally
  Lipschitz continuous gradient
Accelerated first-order methods for convex optimization with locally Lipschitz continuous gradientSIAM Journal on Optimization (SIAM J. Optim.), 2022
Zhaosong Lu
Sanyou Mei
220
9
0
02 Jun 2022
Making SGD Parameter-Free
Making SGD Parameter-FreeAnnual Conference Computational Learning Theory (COLT), 2022
Y. Carmon
Oliver Hinder
310
54
0
04 May 2022
Understanding the unstable convergence of gradient descent
Understanding the unstable convergence of gradient descentInternational Conference on Machine Learning (ICML), 2022
Kwangjun Ahn
J.N. Zhang
S. Sra
254
72
0
03 Apr 2022
A first-order primal-dual method with adaptivity to local smoothness
A first-order primal-dual method with adaptivity to local smoothnessNeural Information Processing Systems (NeurIPS), 2021
Maria-Luiza Vladarean
Yura Malitsky
Volkan Cevher
139
17
0
28 Oct 2021
Eigencurve: Optimal Learning Rate Schedule for SGD on Quadratic
  Objectives with Skewed Hessian Spectrums
Eigencurve: Optimal Learning Rate Schedule for SGD on Quadratic Objectives with Skewed Hessian Spectrums
Boyao Wang
Haishan Ye
Tong Zhang
281
17
0
27 Oct 2021
Leveraging Non-uniformity in First-order Non-convex Optimization
Leveraging Non-uniformity in First-order Non-convex OptimizationInternational Conference on Machine Learning (ICML), 2021
Jincheng Mei
Yue Gao
Bo Dai
Csaba Szepesvári
Dale Schuurmans
215
52
0
13 May 2021
Gradient Descent on Neural Networks Typically Occurs at the Edge of
  Stability
Gradient Descent on Neural Networks Typically Occurs at the Edge of StabilityInternational Conference on Learning Representations (ICLR), 2021
Jeremy M. Cohen
Simran Kaur
Yuanzhi Li
J. Zico Kolter
Ameet Talwalkar
ODL
337
331
0
26 Feb 2021
Reconciling Modern Deep Learning with Traditional Optimization Analyses:
  The Intrinsic Learning Rate
Reconciling Modern Deep Learning with Traditional Optimization Analyses: The Intrinsic Learning Rate
Zhiyuan Li
Kaifeng Lyu
Sanjeev Arora
219
77
0
06 Oct 2020
Improved Analysis of Clipping Algorithms for Non-convex Optimization
Improved Analysis of Clipping Algorithms for Non-convex OptimizationNeural Information Processing Systems (NeurIPS), 2020
Bohang Zhang
Jikai Jin
Cong Fang
Liwei Wang
291
110
0
05 Oct 2020
Halting Time is Predictable for Large Models: A Universality Property
  and Average-case Analysis
Halting Time is Predictable for Large Models: A Universality Property and Average-case Analysis
Courtney Paquette
B. V. Merrienboer
Elliot Paquette
Fabian Pedregosa
343
29
0
08 Jun 2020
PyTorch: An Imperative Style, High-Performance Deep Learning Library
PyTorch: An Imperative Style, High-Performance Deep Learning LibraryNeural Information Processing Systems (NeurIPS), 2019
Adam Paszke
Sam Gross
Francisco Massa
Adam Lerer
James Bradbury
...
Sasank Chilamkurthy
Benoit Steiner
Lu Fang
Junjie Bai
Soumith Chintala
ODL
920
47,917
0
03 Dec 2019
Why gradient clipping accelerates training: A theoretical justification
  for adaptivity
Why gradient clipping accelerates training: A theoretical justification for adaptivityInternational Conference on Learning Representations (ICLR), 2019
J.N. Zhang
Tianxing He
S. Sra
Ali Jadbabaie
294
539
0
28 May 2019
Convergence Rates for Deterministic and Stochastic Subgradient Methods
  Without Lipschitz Continuity
Convergence Rates for Deterministic and Stochastic Subgradient Methods Without Lipschitz Continuity
Benjamin Grimmer
205
51
0
12 Dec 2017
Online to Offline Conversions, Universality and Adaptive Minibatch Sizes
Online to Offline Conversions, Universality and Adaptive Minibatch SizesNeural Information Processing Systems (NeurIPS), 2017
Kfir Y. Levy
ODL
227
65
0
30 May 2017
Linear Convergence of Gradient and Proximal-Gradient Methods Under the
  Polyak-Łojasiewicz Condition
Linear Convergence of Gradient and Proximal-Gradient Methods Under the Polyak-Łojasiewicz Condition
Hamed Karimi
J. Nutini
Mark Schmidt
683
1,365
0
16 Aug 2016
Delving Deep into Rectifiers: Surpassing Human-Level Performance on
  ImageNet Classification
Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification
Kaiming He
Xinming Zhang
Shaoqing Ren
Jian Sun
VLM
818
19,732
0
06 Feb 2015
Practical recommendations for gradient-based training of deep
  architectures
Practical recommendations for gradient-based training of deep architecturesNeural Networks (NN), 2012
Yoshua Bengio
3DHODL
476
2,300
0
24 Jun 2012
Less Regret via Online Conditioning
Less Regret via Online Conditioning
Matthew J. Streeter
H. B. McMahan
ODL
251
72
0
25 Feb 2010
1