Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2403.04081
Cited By
v1
v2 (latest)
Directional Smoothness and Gradient Methods: Convergence and Adaptivity
6 March 2024
Aaron Mishkin
Ahmed Khaled
Yuanhao Wang
Aaron Defazio
Robert Mansel Gower
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Directional Smoothness and Gradient Methods: Convergence and Adaptivity"
30 / 30 papers shown
Title
Sparse Polyak: an adaptive step size rule for high-dimensional M-estimation
Tianqi Qiao
Marie Maros
93
2
0
11 Sep 2025
Glocal Smoothness: Line Search can really help!
Curtis Fox
Aaron Mishkin
Sharan Vaswani
Mark Schmidt
154
4
0
14 Jun 2025
New Perspectives on the Polyak Stepsize: Surrogate Functions and Negative Results
Francesco Orabona
Ryan DÓrazio
223
5
0
26 May 2025
Nested Stochastic Algorithm for Generalized Sinkhorn distance-Regularized Distributionally Robust Optimization
Yue Yang
Yi Zhou
Zhaosong Lu
253
0
0
29 Mar 2025
Gradient Descent on Logistic Regression with Non-Separable Data and Large Step Sizes
Si Yi Meng
Antonio Orvieto
Daniel Yiming Cao
Christopher De Sa
191
6
0
07 Jun 2024
SPAM: Stochastic Proximal Point Method with Momentum Variance Reduction for Non-convex Cross-Device Federated Learning
Avetik G. Karagulyan
Egor Shulgin
Abdurakhmon Sadiev
Peter Richtárik
FedML
221
4
0
30 May 2024
Faster Convergence of Stochastic Accelerated Gradient Descent under Interpolation
Aaron Mishkin
Mert Pilanci
Mark Schmidt
270
2
0
03 Apr 2024
Non-Uniform Smoothness for Gradient Descent
A. Berahas
Lindon Roberts
Fred Roosta
143
5
0
15 Nov 2023
Normalized Gradients for All
Francesco Orabona
190
18
0
10 Aug 2023
Convex and Non-convex Optimization Under Generalized Smoothness
Neural Information Processing Systems (NeurIPS), 2023
Haochuan Li
Jian Qian
Yi Tian
Alexander Rakhlin
Ali Jadbabaie
319
59
0
02 Jun 2023
Toward Understanding Why Adam Converges Faster Than SGD for Transformers
Yan Pan
Yuanzhi Li
204
52
0
31 May 2023
Adaptive Gradient Methods at the Edge of Stability
Jeremy M. Cohen
Behrooz Ghorbani
Shankar Krishnan
Naman Agarwal
Sourabh Medapati
...
Daniel Suo
David E. Cardoze
Zachary Nado
George E. Dahl
Justin Gilmer
ODL
223
63
0
29 Jul 2022
Accelerated first-order methods for convex optimization with locally Lipschitz continuous gradient
SIAM Journal on Optimization (SIAM J. Optim.), 2022
Zhaosong Lu
Sanyou Mei
220
9
0
02 Jun 2022
Making SGD Parameter-Free
Annual Conference Computational Learning Theory (COLT), 2022
Y. Carmon
Oliver Hinder
310
54
0
04 May 2022
Understanding the unstable convergence of gradient descent
International Conference on Machine Learning (ICML), 2022
Kwangjun Ahn
J.N. Zhang
S. Sra
254
72
0
03 Apr 2022
A first-order primal-dual method with adaptivity to local smoothness
Neural Information Processing Systems (NeurIPS), 2021
Maria-Luiza Vladarean
Yura Malitsky
Volkan Cevher
139
17
0
28 Oct 2021
Eigencurve: Optimal Learning Rate Schedule for SGD on Quadratic Objectives with Skewed Hessian Spectrums
Boyao Wang
Haishan Ye
Tong Zhang
281
17
0
27 Oct 2021
Leveraging Non-uniformity in First-order Non-convex Optimization
International Conference on Machine Learning (ICML), 2021
Jincheng Mei
Yue Gao
Bo Dai
Csaba Szepesvári
Dale Schuurmans
215
52
0
13 May 2021
Gradient Descent on Neural Networks Typically Occurs at the Edge of Stability
International Conference on Learning Representations (ICLR), 2021
Jeremy M. Cohen
Simran Kaur
Yuanzhi Li
J. Zico Kolter
Ameet Talwalkar
ODL
337
331
0
26 Feb 2021
Reconciling Modern Deep Learning with Traditional Optimization Analyses: The Intrinsic Learning Rate
Zhiyuan Li
Kaifeng Lyu
Sanjeev Arora
219
77
0
06 Oct 2020
Improved Analysis of Clipping Algorithms for Non-convex Optimization
Neural Information Processing Systems (NeurIPS), 2020
Bohang Zhang
Jikai Jin
Cong Fang
Liwei Wang
291
110
0
05 Oct 2020
Halting Time is Predictable for Large Models: A Universality Property and Average-case Analysis
Courtney Paquette
B. V. Merrienboer
Elliot Paquette
Fabian Pedregosa
343
29
0
08 Jun 2020
PyTorch: An Imperative Style, High-Performance Deep Learning Library
Neural Information Processing Systems (NeurIPS), 2019
Adam Paszke
Sam Gross
Francisco Massa
Adam Lerer
James Bradbury
...
Sasank Chilamkurthy
Benoit Steiner
Lu Fang
Junjie Bai
Soumith Chintala
ODL
920
47,917
0
03 Dec 2019
Why gradient clipping accelerates training: A theoretical justification for adaptivity
International Conference on Learning Representations (ICLR), 2019
J.N. Zhang
Tianxing He
S. Sra
Ali Jadbabaie
294
539
0
28 May 2019
Convergence Rates for Deterministic and Stochastic Subgradient Methods Without Lipschitz Continuity
Benjamin Grimmer
205
51
0
12 Dec 2017
Online to Offline Conversions, Universality and Adaptive Minibatch Sizes
Neural Information Processing Systems (NeurIPS), 2017
Kfir Y. Levy
ODL
227
65
0
30 May 2017
Linear Convergence of Gradient and Proximal-Gradient Methods Under the Polyak-Łojasiewicz Condition
Hamed Karimi
J. Nutini
Mark Schmidt
683
1,365
0
16 Aug 2016
Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification
Kaiming He
Xinming Zhang
Shaoqing Ren
Jian Sun
VLM
818
19,732
0
06 Feb 2015
Practical recommendations for gradient-based training of deep architectures
Neural Networks (NN), 2012
Yoshua Bengio
3DH
ODL
476
2,300
0
24 Jun 2012
Less Regret via Online Conditioning
Matthew J. Streeter
H. B. McMahan
ODL
251
72
0
25 Feb 2010
1