Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1905.11881
Cited By
Why gradient clipping accelerates training: A theoretical justification for adaptivity
28 May 2019
Junzhe Zhang
Tianxing He
S. Sra
Ali Jadbabaie
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Why gradient clipping accelerates training: A theoretical justification for adaptivity"
28 / 78 papers shown
Title
Reachability Constrained Reinforcement Learning
Dongjie Yu
Haitong Ma
Sheng Li
Jianyu Chen
63
54
0
16 May 2022
CowClip: Reducing CTR Prediction Model Training Time from 12 hours to 10 minutes on 1 GPU
Zangwei Zheng
Peng Xu
Xuan Zou
Da Tang
Zhen Li
...
Xiangzhuo Ding
Fuzhao Xue
Ziheng Qing
Youlong Cheng
Yang You
VLM
44
7
0
13 Apr 2022
Nonlinear gradient mappings and stochastic optimization: A general framework with applications to heavy-tail noise
D. Jakovetić
Dragana Bajović
Anit Kumar Sahu
S. Kar
Nemanja Milošević
Dusan Stamenkovic
17
12
0
06 Apr 2022
Differentially Private Learning Needs Hidden State (Or Much Faster Convergence)
Jiayuan Ye
Reza Shokri
FedML
27
44
0
10 Mar 2022
Robust Training of Neural Networks Using Scale Invariant Architectures
Zhiyuan Li
Srinadh Bhojanapalli
Manzil Zaheer
Sashank J. Reddi
Surinder Kumar
19
27
0
02 Feb 2022
FedComm: Federated Learning as a Medium for Covert Communication
Dorjan Hitaj
Giulio Pagnotta
Briland Hitaj
Fernando Perez-Cruz
L. Mancini
FedML
32
10
0
21 Jan 2022
Improving Differentially Private SGD via Randomly Sparsified Gradients
Junyi Zhu
Matthew B. Blaschko
26
5
0
01 Dec 2021
As if by magic: self-supervised training of deep despeckling networks with MERLIN
Emanuele Dalsasso
L. Denis
F. Tupin
18
64
0
25 Oct 2021
Non-convex Distributionally Robust Optimization: Non-asymptotic Analysis
Jikai Jin
Samir Bhatt
Haiyang Wang
Liwei Wang
32
47
0
24 Oct 2021
Pixel-Level Face Image Quality Assessment for Explainable Face Recognition
Philipp Terhörst
Marco Huber
Naser Damer
Florian Kirchbuchner
Kiran Raja
Arjan Kuijper
CVBM
21
25
0
21 Oct 2021
On the Generalization of Models Trained with SGD: Information-Theoretic Bounds and Implications
Ziqiao Wang
Yongyi Mao
FedML
MLT
37
22
0
07 Oct 2021
Stochastic Training is Not Necessary for Generalization
Jonas Geiping
Micah Goldblum
Phillip E. Pope
Michael Moeller
Tom Goldstein
89
72
0
29 Sep 2021
Activated Gradients for Deep Neural Networks
Mei Liu
Liangming Chen
Xiaohao Du
Long Jin
Mingsheng Shang
ODL
AI4CE
27
135
0
09 Jul 2021
Introducing Self-Attention to Target Attentive Graph Neural Networks
Sai Mitheran
Abhinav Java
Surya Kant Sahu
Arshad Shaikh
21
9
0
04 Jul 2021
Ranger21: a synergistic deep learning optimizer
Less Wright
Nestor Demeure
ODL
AI4CE
24
85
0
25 Jun 2021
A novel time-frequency Transformer based on self-attention mechanism and its application in fault diagnosis of rolling bearings
Yifei Ding
M. Jia
Qiuhua Miao
Yudong Cao
16
268
0
19 Apr 2021
Differential Privacy and Byzantine Resilience in SGD: Do They Add Up?
R. Guerraoui
Nirupam Gupta
Rafael Pinot
Sébastien Rouault
John Stephan
33
30
0
16 Feb 2021
Stability and Convergence of Stochastic Gradient Clipping: Beyond Lipschitz Continuity and Smoothness
Vien V. Mai
M. Johansson
31
38
0
12 Feb 2021
High-Performance Large-Scale Image Recognition Without Normalization
Andrew Brock
Soham De
Samuel L. Smith
Karen Simonyan
VLM
223
512
0
11 Feb 2021
Training Deep Architectures Without End-to-End Backpropagation: A Survey on the Provably Optimal Methods
Shiyu Duan
José C. Príncipe
MQ
25
3
0
09 Jan 2021
Reverse engineering learned optimizers reveals known and novel mechanisms
Niru Maheswaranathan
David Sussillo
Luke Metz
Ruoxi Sun
Jascha Narain Sohl-Dickstein
16
21
0
04 Nov 2020
State space models for building control: how deep should you go?
B. Schubnel
R. Carrillo
Paolo Taddeo
L. C. Casals
J. Salom
Y. Stauffer
P. Alet
19
14
0
23 Oct 2020
Just Pick a Sign: Optimizing Deep Multitask Models with Gradient Sign Dropout
Zhao Chen
Jiquan Ngiam
Yanping Huang
Thang Luong
Henrik Kretzschmar
Yuning Chai
Dragomir Anguelov
41
206
0
14 Oct 2020
Review: Deep Learning in Electron Microscopy
Jeffrey M. Ede
31
79
0
17 Sep 2020
Stochastic Normalized Gradient Descent with Momentum for Large-Batch Training
Shen-Yi Zhao
Chang-Wei Shi
Yin-Peng Xie
Wu-Jun Li
ODL
18
8
0
28 Jul 2020
Evading Curse of Dimensionality in Unconstrained Private GLMs via Private Gradient Descent
Shuang Song
Thomas Steinke
Om Thakkar
Abhradeep Thakurta
27
50
0
11 Jun 2020
Gradient Monitored Reinforcement Learning
Mohammed Sharafath Abdul Hameed
Gavneet Singh Chadha
Andreas Schwung
S. Ding
28
10
0
25 May 2020
A Proximal Stochastic Gradient Method with Progressive Variance Reduction
Lin Xiao
Tong Zhang
ODL
84
736
0
19 Mar 2014
Previous
1
2