v1v2 (latest)

Directional Smoothness and Gradient Methods: Convergence and Adaptivity

6 March 2024

Papers citing "Directional Smoothness and Gradient Methods: Convergence and Adaptivity"

30 / 30 papers shown

Title
Sparse Polyak: an adaptive step size rule for high-dimensional M-estimation Tianqi Qiao Marie Maros 93 2 0 11 Sep 2025
Glocal Smoothness: Line Search can really help! Curtis Fox Aaron Mishkin Sharan Vaswani Mark Schmidt 154 4 0 14 Jun 2025
New Perspectives on the Polyak Stepsize: Surrogate Functions and Negative Results Francesco Orabona Ryan DÓrazio 223 5 0 26 May 2025
Nested Stochastic Algorithm for Generalized Sinkhorn distance-Regularized Distributionally Robust Optimization Yue Yang Yi Zhou Zhaosong Lu 253 0 0 29 Mar 2025
Gradient Descent on Logistic Regression with Non-Separable Data and Large Step Sizes Si Yi Meng Antonio Orvieto Daniel Yiming Cao Christopher De Sa 191 6 0 07 Jun 2024
SPAM: Stochastic Proximal Point Method with Momentum Variance Reduction for Non-convex Cross-Device Federated Learning Avetik G. Karagulyan Egor Shulgin Abdurakhmon Sadiev Peter Richtárik FedML 221 4 0 30 May 2024
Faster Convergence of Stochastic Accelerated Gradient Descent under Interpolation Aaron Mishkin Mert Pilanci Mark Schmidt 270 2 0 03 Apr 2024
Non-Uniform Smoothness for Gradient Descent A. Berahas Lindon Roberts Fred Roosta 143 5 0 15 Nov 2023
Normalized Gradients for All Francesco Orabona 190 18 0 10 Aug 2023
Convex and Non-convex Optimization Under Generalized SmoothnessNeural Information Processing Systems (NeurIPS), 2023 Haochuan Li Jian Qian Yi Tian Alexander Rakhlin Ali Jadbabaie 319 59 0 02 Jun 2023
Toward Understanding Why Adam Converges Faster Than SGD for Transformers Yan Pan Yuanzhi Li 204 52 0 31 May 2023
Adaptive Gradient Methods at the Edge of Stability Jeremy M. Cohen Behrooz Ghorbani Shankar Krishnan Naman Agarwal Sourabh Medapati ... Daniel Suo David E. Cardoze Zachary Nado George E. Dahl Justin Gilmer ODL 223 63 0 29 Jul 2022
Accelerated first-order methods for convex optimization with locally Lipschitz continuous gradientSIAM Journal on Optimization (SIAM J. Optim.), 2022 Zhaosong Lu Sanyou Mei 220 9 0 02 Jun 2022
Making SGD Parameter-FreeAnnual Conference Computational Learning Theory (COLT), 2022 Y. Carmon Oliver Hinder 310 54 0 04 May 2022
Understanding the unstable convergence of gradient descentInternational Conference on Machine Learning (ICML), 2022 Kwangjun Ahn J.N. Zhang S. Sra 254 72 0 03 Apr 2022
A first-order primal-dual method with adaptivity to local smoothnessNeural Information Processing Systems (NeurIPS), 2021 Maria-Luiza Vladarean Yura Malitsky Volkan Cevher 139 17 0 28 Oct 2021
Eigencurve: Optimal Learning Rate Schedule for SGD on Quadratic Objectives with Skewed Hessian Spectrums Boyao Wang Haishan Ye Tong Zhang 281 17 0 27 Oct 2021
Leveraging Non-uniformity in First-order Non-convex OptimizationInternational Conference on Machine Learning (ICML), 2021 Jincheng Mei Yue Gao Bo Dai Csaba Szepesvári Dale Schuurmans 215 52 0 13 May 2021
Gradient Descent on Neural Networks Typically Occurs at the Edge of StabilityInternational Conference on Learning Representations (ICLR), 2021 Jeremy M. Cohen Simran Kaur Yuanzhi Li J. Zico Kolter Ameet Talwalkar ODL 337 331 0 26 Feb 2021
Reconciling Modern Deep Learning with Traditional Optimization Analyses: The Intrinsic Learning Rate Zhiyuan Li Kaifeng Lyu Sanjeev Arora 219 77 0 06 Oct 2020
Improved Analysis of Clipping Algorithms for Non-convex OptimizationNeural Information Processing Systems (NeurIPS), 2020 Bohang Zhang Jikai Jin Cong Fang Liwei Wang 291 110 0 05 Oct 2020
Halting Time is Predictable for Large Models: A Universality Property and Average-case Analysis Courtney Paquette B. V. Merrienboer Elliot Paquette Fabian Pedregosa 343 29 0 08 Jun 2020
PyTorch: An Imperative Style, High-Performance Deep Learning LibraryNeural Information Processing Systems (NeurIPS), 2019 Adam Paszke Sam Gross Francisco Massa Adam Lerer James Bradbury ... Sasank Chilamkurthy Benoit Steiner Lu Fang Junjie Bai Soumith Chintala ODL 920 47,917 0 03 Dec 2019
Why gradient clipping accelerates training: A theoretical justification for adaptivityInternational Conference on Learning Representations (ICLR), 2019 J.N. Zhang Tianxing He S. Sra Ali Jadbabaie 294 539 0 28 May 2019
Convergence Rates for Deterministic and Stochastic Subgradient Methods Without Lipschitz Continuity Benjamin Grimmer 205 51 0 12 Dec 2017
Online to Offline Conversions, Universality and Adaptive Minibatch SizesNeural Information Processing Systems (NeurIPS), 2017 Kfir Y. Levy ODL 227 65 0 30 May 2017
Linear Convergence of Gradient and Proximal-Gradient Methods Under the Polyak-Łojasiewicz Condition Hamed Karimi J. Nutini Mark Schmidt 683 1,365 0 16 Aug 2016
Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification Kaiming He Xinming Zhang Shaoqing Ren Jian Sun VLM 818 19,732 0 06 Feb 2015
Practical recommendations for gradient-based training of deep architecturesNeural Networks (NN), 2012 Yoshua Bengio 3DH ODL 476 2,300 0 24 Jun 2012
Less Regret via Online Conditioning Matthew J. Streeter H. B. McMahan ODL 251 72 0 25 Feb 2010