Why gradient clipping accelerates training: A theoretical justification for adaptivity

28 May 2019

Tianxing He

Papers citing "Why gradient clipping accelerates training: A theoretical justification for adaptivity"

50 / 78 papers shown

Title
Nested Stochastic Gradient Descent for (Generalized) Sinkhorn Distance-Regularized Distributionally Robust Optimization Yuqing Yang Yi Zhou Zhaosong Lu 49 0 0 29 Mar 2025
Understanding Gradient Orthogonalization for Deep Learning via Non-Euclidean Trust-Region Optimization Dmitry Kovalev 57 0 0 16 Mar 2025
Detecting Backdoor Attacks in Federated Learning via Direction Alignment Inspection Jiahao Xu Zikai Zhang Rui Hu AAML FedML Presented at ResearchTrend Connect \| FedML on 28 Mar 2025 147 0 0 11 Mar 2025
Reinforcement learning with combinatorial actions for coupled restless bandits Lily Xu Bryan Wilder Elias B. Khalil Milind Tambe 67 1 0 01 Mar 2025
Understanding Why Adam Outperforms SGD: Gradient Heterogeneity in Transformers Akiyoshi Tomihari Issei Sato ODL 61 1 0 31 Jan 2025
L3Ms -- Lagrange Large Language Models Guneet S. Dhillon Xingjian Shi Yee Whye Teh Alex Smola 145 0 0 28 Oct 2024
Nonlinear Stochastic Gradient Descent and Heavy-tailed Noise: A Unified Framework and High-probability Guarantees Aleksandar Armacki Shuhua Yu Pranay Sharma Gauri Joshi Dragana Bajović D. Jakovetić S. Kar 57 2 0 17 Oct 2024
Extended convexity and smoothness and their applications in deep learning Binchuan Qi Wei Gong Li Li 61 0 0 08 Oct 2024
An Accelerated Algorithm for Stochastic Bilevel Optimization under Unbounded Smoothness Xiaochuan Gong Jie Hao Mingrui Liu 43 2 0 28 Sep 2024
Recent Advances in Non-convex Smoothness Conditions and Applicability to Deep Linear Neural Networks Vivak Patel Christian Varner 28 0 0 20 Sep 2024
Dynamic Decoupling of Placid Terminal Attractor-based Gradient Descent Algorithm Jinwei Zhao Marco Gori Alessandro Betti S. Melacci Hongtao Zhang Jiedong Liu Xinhong Hei 30 0 0 10 Sep 2024
A New First-Order Meta-Learning Algorithm with Convergence Guarantees El Mahdi Chayti Martin Jaggi 25 1 0 05 Sep 2024
Achieving Byzantine-Resilient Federated Learning via Layer-Adaptive Sparsified Model Aggregation Jiahao Xu Zikai Zhang Rui Hu 44 4 0 02 Sep 2024
Accelerated Stochastic Min-Max Optimization Based on Bias-corrected Momentum H. Cai Sulaiman A. Alghunaim Ali H.Sayed 43 1 0 18 Jun 2024
Regularized Gradient Clipping Provably Trains Wide and Deep Neural Networks Matteo Tucat Anirbit Mukherjee Procheta Sen Mingfei Sun Omar Rivasplata MLT 36 1 0 12 Apr 2024
Convergence Guarantees for RMSProp and Adam in Generalized-smooth Non-convex Optimization with Affine Noise Variance Qi Zhang Yi Zhou Shaofeng Zou 39 3 0 01 Apr 2024
Directional Smoothness and Gradient Methods: Convergence and Adaptivity Aaron Mishkin Ahmed Khaled Yuanhao Wang Aaron Defazio Robert Mansel Gower 44 6 0 06 Mar 2024
Heavy-Tailed Class Imbalance and Why Adam Outperforms Gradient Descent on Language Models Frederik Kunstner Robin Yadav Alan Milligan Mark Schmidt Alberto Bietti 39 26 0 29 Feb 2024
On Convergence of Adam for Stochastic Optimization under Relaxed Assumptions Yusu Hong Junhong Lin 46 10 0 06 Feb 2024
Regularized Q-Learning with Linear Function Approximation Jiachen Xi Alfredo Garcia P. Momcilovic 35 2 0 26 Jan 2024
Bilevel Optimization under Unbounded Smoothness: A New Algorithm and Convergence Analysis Jie Hao Xiaochuan Gong Mingrui Liu 30 7 0 17 Jan 2024
PCDP-SGD: Improving the Convergence of Differentially Private SGD via Projection in Advance Haichao Sha Ruixuan Liu Yi-xiao Liu Hong Chen 52 1 0 06 Dec 2023
TorchDEQ: A Library for Deep Equilibrium Models Zhengyang Geng J. Zico Kolter VLM 56 12 0 28 Oct 2023
Enhancing High-Resolution 3D Generation through Pixel-wise Gradient Clipping Zijie Pan Jiachen Lu Xiatian Zhu Li Zhang DiffM 28 11 0 19 Oct 2023
A Novel Gradient Methodology with Economical Objective Function Evaluations for Data Science Applications Christian Varner Vivak Patel 23 2 0 19 Sep 2023
Kernel Limit of Recurrent Neural Networks Trained on Ergodic Data Sequences Samuel Chun-Hei Lam Justin A. Sirignano K. Spiliopoulos 30 2 0 28 Aug 2023
Large-kernel Attention for Efficient and Robust Brain Lesion Segmentation Liam Chalcroft Ruben Lourencco Pereira Mikael Brudfors Andrew S. Kayser M. D’Esposito Cathy J. Price Ioannis Pappas John Ashburner ViT 3DV MedIm 29 8 0 14 Aug 2023
Multiplicative update rules for accelerating deep learning training and increasing robustness Manos Kirtas Nikolaos Passalis Anastasios Tefas AAML OOD 36 2 0 14 Jul 2023
Clip21: Error Feedback for Gradient Clipping Sarit Khirirat Eduard A. Gorbunov Samuel Horváth Rustem Islamov Fakhri Karray Peter Richtárik 32 10 0 30 May 2023
Convergence of AdaGrad for Non-convex Objectives: Simple Proofs and Relaxed Assumptions Bo Wang Huishuai Zhang Zhirui Ma Wei Chen 32 49 0 29 May 2023
PQA: Exploring the Potential of Product Quantization in DNN Hardware Acceleration Ahmed F. AbouElhamayed Angela Cui Javier Fernandez-Marques Nicholas D. Lane Mohamed S. Abdelfattah MQ 23 4 0 25 May 2023
Sophia: A Scalable Stochastic Second-order Optimizer for Language Model Pre-training Hong Liu Zhiyuan Li David Leo Wright Hall Percy Liang Tengyu Ma VLM 52 128 0 23 May 2023
Two Sides of One Coin: the Limits of Untuned SGD and the Power of Adaptive Methods Junchi Yang Xiang Li Ilyas Fatkhullin Niao He 36 15 0 21 May 2023
Convergence and Privacy of Decentralized Nonconvex Optimization with Gradient Clipping and Communication Compression Boyue Li Yuejie Chi 21 12 0 17 May 2023
Global Convergence of Deep Galerkin and PINNs Methods for Solving Partial Differential Equations Deqing Jiang Justin A. Sirignano Samuel N. Cohen 24 6 0 10 May 2023
Automatic Gradient Descent: Deep Learning without Hyperparameters Jeremy Bernstein Chris Mingard Kevin Huang Navid Azizan Yisong Yue ODL 16 17 0 11 Apr 2023
RAPID: Enabling Fast Online Policy Learning in Dynamic Public Cloud Environments Drew Penney Bin Li Lizhong Chen J. Sydir Anna Drewek-Ossowicka R. Illikkal Charlie Tai R. Iyer Andrew J. Herdrich 28 1 0 10 Apr 2023
Informative regularization for a multi-layer perceptron RR Lyrae classifier under data shift Francisco Pérez-Galarce K. Pichara P. Huijse M. Catelán D. Méry 28 0 0 12 Mar 2023
Improving Training Stability for Multitask Ranking Models in Recommender Systems Jiaxi Tang Yoel Drori Daryl Chang M. Sathiamoorthy Justin Gilmer Li Wei Xinyang Yi Lichan Hong Ed H. Chi 27 10 0 17 Feb 2023
Cyclic and Randomized Stepsizes Invoke Heavier Tails in SGD than Constant Stepsize Mert Gurbuzbalaban Yuanhan Hu Umut Simsekli Lingjiong Zhu LRM 20 1 0 10 Feb 2023
U-Clip: On-Average Unbiased Stochastic Gradient Clipping Bryn Elesedy Marcus Hutter 13 1 0 06 Feb 2023
Wormhole MAML: Meta-Learning in Glued Parameter Space C. Chang Yuan Gao B. Lou 21 0 0 28 Dec 2022
CLIPSep: Learning Text-queried Sound Separation with Noisy Unlabeled Videos Hao-Wen Dong Naoya Takahashi Yuki Mitsufuji Julian McAuley Taylor Berg-Kirkpatrick VLM CLIP 28 25 0 14 Dec 2022
On the Overlooked Structure of Stochastic Gradients Zeke Xie Qian-Yuan Tang Mingming Sun P. Li 28 6 0 05 Dec 2022
Variants of SGD for Lipschitz Continuous Loss Functions in Low-Precision Environments Michael R. Metel 28 1 0 09 Nov 2022
Deep Learning Object Detection Approaches to Signal Identification Luke Wood K. Anderson Peter Gerstoft Richard Bell Raghab Subbaraman Dinesh Bharadia 13 2 0 27 Oct 2022
Analyzing historical diagnosis code data from NIH N3C and RECOVER Programs using deep learning to determine risk factors for Long Covid S. Sengupta Johanna J. Loomba Suchetha Sharma Donald E. Brown L. Thorpe M. Haendel C. Chute Stephanie S. Hong 14 6 0 05 Oct 2022
Taming Fat-Tailed ("Heavier-Tailed'' with Potentially Infinite Variance) Noise in Federated Learning Haibo Yang Pei-Yuan Qiu Jia Liu FedML 27 12 0 03 Oct 2022
Convergence of Stein Variational Gradient Descent under a Weaker Smoothness Condition Lukang Sun Avetik G. Karagulyan Peter Richtárik 26 19 0 01 Jun 2022
WaveMix: A Resource-efficient Neural Network for Image Analysis Pranav Jeevan Kavitha Viswanathan S. AnanduA A. Sethi 20 20 0 28 May 2022