ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1905.11881
  4. Cited By
Why gradient clipping accelerates training: A theoretical justification
  for adaptivity

Why gradient clipping accelerates training: A theoretical justification for adaptivity

28 May 2019
Junzhe Zhang
Tianxing He
S. Sra
Ali Jadbabaie
ArXivPDFHTML

Papers citing "Why gradient clipping accelerates training: A theoretical justification for adaptivity"

28 / 78 papers shown
Title
Reachability Constrained Reinforcement Learning
Reachability Constrained Reinforcement Learning
Dongjie Yu
Haitong Ma
Sheng Li
Jianyu Chen
63
54
0
16 May 2022
CowClip: Reducing CTR Prediction Model Training Time from 12 hours to 10
  minutes on 1 GPU
CowClip: Reducing CTR Prediction Model Training Time from 12 hours to 10 minutes on 1 GPU
Zangwei Zheng
Peng Xu
Xuan Zou
Da Tang
Zhen Li
...
Xiangzhuo Ding
Fuzhao Xue
Ziheng Qing
Youlong Cheng
Yang You
VLM
44
7
0
13 Apr 2022
Nonlinear gradient mappings and stochastic optimization: A general
  framework with applications to heavy-tail noise
Nonlinear gradient mappings and stochastic optimization: A general framework with applications to heavy-tail noise
D. Jakovetić
Dragana Bajović
Anit Kumar Sahu
S. Kar
Nemanja Milošević
Dusan Stamenkovic
17
12
0
06 Apr 2022
Differentially Private Learning Needs Hidden State (Or Much Faster
  Convergence)
Differentially Private Learning Needs Hidden State (Or Much Faster Convergence)
Jiayuan Ye
Reza Shokri
FedML
27
44
0
10 Mar 2022
Robust Training of Neural Networks Using Scale Invariant Architectures
Robust Training of Neural Networks Using Scale Invariant Architectures
Zhiyuan Li
Srinadh Bhojanapalli
Manzil Zaheer
Sashank J. Reddi
Surinder Kumar
19
27
0
02 Feb 2022
FedComm: Federated Learning as a Medium for Covert Communication
FedComm: Federated Learning as a Medium for Covert Communication
Dorjan Hitaj
Giulio Pagnotta
Briland Hitaj
Fernando Perez-Cruz
L. Mancini
FedML
32
10
0
21 Jan 2022
Improving Differentially Private SGD via Randomly Sparsified Gradients
Improving Differentially Private SGD via Randomly Sparsified Gradients
Junyi Zhu
Matthew B. Blaschko
26
5
0
01 Dec 2021
As if by magic: self-supervised training of deep despeckling networks
  with MERLIN
As if by magic: self-supervised training of deep despeckling networks with MERLIN
Emanuele Dalsasso
L. Denis
F. Tupin
18
64
0
25 Oct 2021
Non-convex Distributionally Robust Optimization: Non-asymptotic Analysis
Non-convex Distributionally Robust Optimization: Non-asymptotic Analysis
Jikai Jin
Samir Bhatt
Haiyang Wang
Liwei Wang
32
47
0
24 Oct 2021
Pixel-Level Face Image Quality Assessment for Explainable Face
  Recognition
Pixel-Level Face Image Quality Assessment for Explainable Face Recognition
Philipp Terhörst
Marco Huber
Naser Damer
Florian Kirchbuchner
Kiran Raja
Arjan Kuijper
CVBM
21
25
0
21 Oct 2021
On the Generalization of Models Trained with SGD: Information-Theoretic
  Bounds and Implications
On the Generalization of Models Trained with SGD: Information-Theoretic Bounds and Implications
Ziqiao Wang
Yongyi Mao
FedML
MLT
37
22
0
07 Oct 2021
Stochastic Training is Not Necessary for Generalization
Stochastic Training is Not Necessary for Generalization
Jonas Geiping
Micah Goldblum
Phillip E. Pope
Michael Moeller
Tom Goldstein
89
72
0
29 Sep 2021
Activated Gradients for Deep Neural Networks
Activated Gradients for Deep Neural Networks
Mei Liu
Liangming Chen
Xiaohao Du
Long Jin
Mingsheng Shang
ODL
AI4CE
27
135
0
09 Jul 2021
Introducing Self-Attention to Target Attentive Graph Neural Networks
Introducing Self-Attention to Target Attentive Graph Neural Networks
Sai Mitheran
Abhinav Java
Surya Kant Sahu
Arshad Shaikh
21
9
0
04 Jul 2021
Ranger21: a synergistic deep learning optimizer
Ranger21: a synergistic deep learning optimizer
Less Wright
Nestor Demeure
ODL
AI4CE
24
85
0
25 Jun 2021
A novel time-frequency Transformer based on self-attention mechanism and
  its application in fault diagnosis of rolling bearings
A novel time-frequency Transformer based on self-attention mechanism and its application in fault diagnosis of rolling bearings
Yifei Ding
M. Jia
Qiuhua Miao
Yudong Cao
16
268
0
19 Apr 2021
Differential Privacy and Byzantine Resilience in SGD: Do They Add Up?
Differential Privacy and Byzantine Resilience in SGD: Do They Add Up?
R. Guerraoui
Nirupam Gupta
Rafael Pinot
Sébastien Rouault
John Stephan
33
30
0
16 Feb 2021
Stability and Convergence of Stochastic Gradient Clipping: Beyond
  Lipschitz Continuity and Smoothness
Stability and Convergence of Stochastic Gradient Clipping: Beyond Lipschitz Continuity and Smoothness
Vien V. Mai
M. Johansson
31
38
0
12 Feb 2021
High-Performance Large-Scale Image Recognition Without Normalization
High-Performance Large-Scale Image Recognition Without Normalization
Andrew Brock
Soham De
Samuel L. Smith
Karen Simonyan
VLM
223
512
0
11 Feb 2021
Training Deep Architectures Without End-to-End Backpropagation: A Survey
  on the Provably Optimal Methods
Training Deep Architectures Without End-to-End Backpropagation: A Survey on the Provably Optimal Methods
Shiyu Duan
José C. Príncipe
MQ
25
3
0
09 Jan 2021
Reverse engineering learned optimizers reveals known and novel
  mechanisms
Reverse engineering learned optimizers reveals known and novel mechanisms
Niru Maheswaranathan
David Sussillo
Luke Metz
Ruoxi Sun
Jascha Narain Sohl-Dickstein
16
21
0
04 Nov 2020
State space models for building control: how deep should you go?
State space models for building control: how deep should you go?
B. Schubnel
R. Carrillo
Paolo Taddeo
L. C. Casals
J. Salom
Y. Stauffer
P. Alet
19
14
0
23 Oct 2020
Just Pick a Sign: Optimizing Deep Multitask Models with Gradient Sign
  Dropout
Just Pick a Sign: Optimizing Deep Multitask Models with Gradient Sign Dropout
Zhao Chen
Jiquan Ngiam
Yanping Huang
Thang Luong
Henrik Kretzschmar
Yuning Chai
Dragomir Anguelov
41
206
0
14 Oct 2020
Review: Deep Learning in Electron Microscopy
Review: Deep Learning in Electron Microscopy
Jeffrey M. Ede
31
79
0
17 Sep 2020
Stochastic Normalized Gradient Descent with Momentum for Large-Batch
  Training
Stochastic Normalized Gradient Descent with Momentum for Large-Batch Training
Shen-Yi Zhao
Chang-Wei Shi
Yin-Peng Xie
Wu-Jun Li
ODL
18
8
0
28 Jul 2020
Evading Curse of Dimensionality in Unconstrained Private GLMs via
  Private Gradient Descent
Evading Curse of Dimensionality in Unconstrained Private GLMs via Private Gradient Descent
Shuang Song
Thomas Steinke
Om Thakkar
Abhradeep Thakurta
27
50
0
11 Jun 2020
Gradient Monitored Reinforcement Learning
Gradient Monitored Reinforcement Learning
Mohammed Sharafath Abdul Hameed
Gavneet Singh Chadha
Andreas Schwung
S. Ding
28
10
0
25 May 2020
A Proximal Stochastic Gradient Method with Progressive Variance
  Reduction
A Proximal Stochastic Gradient Method with Progressive Variance Reduction
Lin Xiao
Tong Zhang
ODL
84
736
0
19 Mar 2014
Previous
12