Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1905.11881
Cited By
Why gradient clipping accelerates training: A theoretical justification for adaptivity
28 May 2019
Junzhe Zhang
Tianxing He
S. Sra
Ali Jadbabaie
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Why gradient clipping accelerates training: A theoretical justification for adaptivity"
50 / 78 papers shown
Title
Nested Stochastic Gradient Descent for (Generalized) Sinkhorn Distance-Regularized Distributionally Robust Optimization
Yuqing Yang
Yi Zhou
Zhaosong Lu
49
0
0
29 Mar 2025
Understanding Gradient Orthogonalization for Deep Learning via Non-Euclidean Trust-Region Optimization
Dmitry Kovalev
57
0
0
16 Mar 2025
Detecting Backdoor Attacks in Federated Learning via Direction Alignment Inspection
Jiahao Xu
Zikai Zhang
Rui Hu
AAML
FedML
Presented at
ResearchTrend Connect | FedML
on
28 Mar 2025
147
0
0
11 Mar 2025
Reinforcement learning with combinatorial actions for coupled restless bandits
Lily Xu
Bryan Wilder
Elias B. Khalil
Milind Tambe
67
1
0
01 Mar 2025
Understanding Why Adam Outperforms SGD: Gradient Heterogeneity in Transformers
Akiyoshi Tomihari
Issei Sato
ODL
61
1
0
31 Jan 2025
L3Ms -- Lagrange Large Language Models
Guneet S. Dhillon
Xingjian Shi
Yee Whye Teh
Alex Smola
145
0
0
28 Oct 2024
Nonlinear Stochastic Gradient Descent and Heavy-tailed Noise: A Unified Framework and High-probability Guarantees
Aleksandar Armacki
Shuhua Yu
Pranay Sharma
Gauri Joshi
Dragana Bajović
D. Jakovetić
S. Kar
57
2
0
17 Oct 2024
Extended convexity and smoothness and their applications in deep learning
Binchuan Qi
Wei Gong
Li Li
61
0
0
08 Oct 2024
An Accelerated Algorithm for Stochastic Bilevel Optimization under Unbounded Smoothness
Xiaochuan Gong
Jie Hao
Mingrui Liu
43
2
0
28 Sep 2024
Recent Advances in Non-convex Smoothness Conditions and Applicability to Deep Linear Neural Networks
Vivak Patel
Christian Varner
28
0
0
20 Sep 2024
Dynamic Decoupling of Placid Terminal Attractor-based Gradient Descent Algorithm
Jinwei Zhao
Marco Gori
Alessandro Betti
S. Melacci
Hongtao Zhang
Jiedong Liu
Xinhong Hei
30
0
0
10 Sep 2024
A New First-Order Meta-Learning Algorithm with Convergence Guarantees
El Mahdi Chayti
Martin Jaggi
25
1
0
05 Sep 2024
Achieving Byzantine-Resilient Federated Learning via Layer-Adaptive Sparsified Model Aggregation
Jiahao Xu
Zikai Zhang
Rui Hu
44
4
0
02 Sep 2024
Accelerated Stochastic Min-Max Optimization Based on Bias-corrected Momentum
H. Cai
Sulaiman A. Alghunaim
Ali H.Sayed
43
1
0
18 Jun 2024
Regularized Gradient Clipping Provably Trains Wide and Deep Neural Networks
Matteo Tucat
Anirbit Mukherjee
Procheta Sen
Mingfei Sun
Omar Rivasplata
MLT
36
1
0
12 Apr 2024
Convergence Guarantees for RMSProp and Adam in Generalized-smooth Non-convex Optimization with Affine Noise Variance
Qi Zhang
Yi Zhou
Shaofeng Zou
39
3
0
01 Apr 2024
Directional Smoothness and Gradient Methods: Convergence and Adaptivity
Aaron Mishkin
Ahmed Khaled
Yuanhao Wang
Aaron Defazio
Robert Mansel Gower
44
6
0
06 Mar 2024
Heavy-Tailed Class Imbalance and Why Adam Outperforms Gradient Descent on Language Models
Frederik Kunstner
Robin Yadav
Alan Milligan
Mark Schmidt
Alberto Bietti
39
26
0
29 Feb 2024
On Convergence of Adam for Stochastic Optimization under Relaxed Assumptions
Yusu Hong
Junhong Lin
46
10
0
06 Feb 2024
Regularized Q-Learning with Linear Function Approximation
Jiachen Xi
Alfredo Garcia
P. Momcilovic
35
2
0
26 Jan 2024
Bilevel Optimization under Unbounded Smoothness: A New Algorithm and Convergence Analysis
Jie Hao
Xiaochuan Gong
Mingrui Liu
30
7
0
17 Jan 2024
PCDP-SGD: Improving the Convergence of Differentially Private SGD via Projection in Advance
Haichao Sha
Ruixuan Liu
Yi-xiao Liu
Hong Chen
52
1
0
06 Dec 2023
TorchDEQ: A Library for Deep Equilibrium Models
Zhengyang Geng
J. Zico Kolter
VLM
56
12
0
28 Oct 2023
Enhancing High-Resolution 3D Generation through Pixel-wise Gradient Clipping
Zijie Pan
Jiachen Lu
Xiatian Zhu
Li Zhang
DiffM
28
11
0
19 Oct 2023
A Novel Gradient Methodology with Economical Objective Function Evaluations for Data Science Applications
Christian Varner
Vivak Patel
23
2
0
19 Sep 2023
Kernel Limit of Recurrent Neural Networks Trained on Ergodic Data Sequences
Samuel Chun-Hei Lam
Justin A. Sirignano
K. Spiliopoulos
30
2
0
28 Aug 2023
Large-kernel Attention for Efficient and Robust Brain Lesion Segmentation
Liam Chalcroft
Ruben Lourencco Pereira
Mikael Brudfors
Andrew S. Kayser
M. D’Esposito
Cathy J. Price
Ioannis Pappas
John Ashburner
ViT
3DV
MedIm
29
8
0
14 Aug 2023
Multiplicative update rules for accelerating deep learning training and increasing robustness
Manos Kirtas
Nikolaos Passalis
Anastasios Tefas
AAML
OOD
36
2
0
14 Jul 2023
Clip21: Error Feedback for Gradient Clipping
Sarit Khirirat
Eduard A. Gorbunov
Samuel Horváth
Rustem Islamov
Fakhri Karray
Peter Richtárik
32
10
0
30 May 2023
Convergence of AdaGrad for Non-convex Objectives: Simple Proofs and Relaxed Assumptions
Bo Wang
Huishuai Zhang
Zhirui Ma
Wei Chen
32
49
0
29 May 2023
PQA: Exploring the Potential of Product Quantization in DNN Hardware Acceleration
Ahmed F. AbouElhamayed
Angela Cui
Javier Fernandez-Marques
Nicholas D. Lane
Mohamed S. Abdelfattah
MQ
23
4
0
25 May 2023
Sophia: A Scalable Stochastic Second-order Optimizer for Language Model Pre-training
Hong Liu
Zhiyuan Li
David Leo Wright Hall
Percy Liang
Tengyu Ma
VLM
52
128
0
23 May 2023
Two Sides of One Coin: the Limits of Untuned SGD and the Power of Adaptive Methods
Junchi Yang
Xiang Li
Ilyas Fatkhullin
Niao He
36
15
0
21 May 2023
Convergence and Privacy of Decentralized Nonconvex Optimization with Gradient Clipping and Communication Compression
Boyue Li
Yuejie Chi
21
12
0
17 May 2023
Global Convergence of Deep Galerkin and PINNs Methods for Solving Partial Differential Equations
Deqing Jiang
Justin A. Sirignano
Samuel N. Cohen
24
6
0
10 May 2023
Automatic Gradient Descent: Deep Learning without Hyperparameters
Jeremy Bernstein
Chris Mingard
Kevin Huang
Navid Azizan
Yisong Yue
ODL
16
17
0
11 Apr 2023
RAPID: Enabling Fast Online Policy Learning in Dynamic Public Cloud Environments
Drew Penney
Bin Li
Lizhong Chen
J. Sydir
Anna Drewek-Ossowicka
R. Illikkal
Charlie Tai
R. Iyer
Andrew J. Herdrich
28
1
0
10 Apr 2023
Informative regularization for a multi-layer perceptron RR Lyrae classifier under data shift
Francisco Pérez-Galarce
K. Pichara
P. Huijse
M. Catelán
D. Méry
28
0
0
12 Mar 2023
Improving Training Stability for Multitask Ranking Models in Recommender Systems
Jiaxi Tang
Yoel Drori
Daryl Chang
M. Sathiamoorthy
Justin Gilmer
Li Wei
Xinyang Yi
Lichan Hong
Ed H. Chi
27
10
0
17 Feb 2023
Cyclic and Randomized Stepsizes Invoke Heavier Tails in SGD than Constant Stepsize
Mert Gurbuzbalaban
Yuanhan Hu
Umut Simsekli
Lingjiong Zhu
LRM
20
1
0
10 Feb 2023
U-Clip: On-Average Unbiased Stochastic Gradient Clipping
Bryn Elesedy
Marcus Hutter
13
1
0
06 Feb 2023
Wormhole MAML: Meta-Learning in Glued Parameter Space
C. Chang
Yuan Gao
B. Lou
21
0
0
28 Dec 2022
CLIPSep: Learning Text-queried Sound Separation with Noisy Unlabeled Videos
Hao-Wen Dong
Naoya Takahashi
Yuki Mitsufuji
Julian McAuley
Taylor Berg-Kirkpatrick
VLM
CLIP
28
25
0
14 Dec 2022
On the Overlooked Structure of Stochastic Gradients
Zeke Xie
Qian-Yuan Tang
Mingming Sun
P. Li
28
6
0
05 Dec 2022
Variants of SGD for Lipschitz Continuous Loss Functions in Low-Precision Environments
Michael R. Metel
28
1
0
09 Nov 2022
Deep Learning Object Detection Approaches to Signal Identification
Luke Wood
K. Anderson
Peter Gerstoft
Richard Bell
Raghab Subbaraman
Dinesh Bharadia
13
2
0
27 Oct 2022
Analyzing historical diagnosis code data from NIH N3C and RECOVER Programs using deep learning to determine risk factors for Long Covid
S. Sengupta
Johanna J. Loomba
Suchetha Sharma
Donald E. Brown
L. Thorpe
M. Haendel
C. Chute
Stephanie S. Hong
14
6
0
05 Oct 2022
Taming Fat-Tailed ("Heavier-Tailed'' with Potentially Infinite Variance) Noise in Federated Learning
Haibo Yang
Pei-Yuan Qiu
Jia Liu
FedML
27
12
0
03 Oct 2022
Convergence of Stein Variational Gradient Descent under a Weaker Smoothness Condition
Lukang Sun
Avetik G. Karagulyan
Peter Richtárik
26
19
0
01 Jun 2022
WaveMix: A Resource-efficient Neural Network for Image Analysis
Pranav Jeevan
Kavitha Viswanathan
S. AnanduA
A. Sethi
20
20
0
28 May 2022
1
2
Next