Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1803.00195
Cited By
v1
v2
v3
v4
v5 (latest)
The Anisotropic Noise in Stochastic Gradient Descent: Its Behavior of Escaping from Sharp Minima and Regularization Effects
1 March 2018
Zhanxing Zhu
Jingfeng Wu
Ting Yu
Lei Wu
Jin Ma
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"The Anisotropic Noise in Stochastic Gradient Descent: Its Behavior of Escaping from Sharp Minima and Regularization Effects"
21 / 21 papers shown
Title
Towards Theoretically Understanding Why SGD Generalizes Better Than ADAM in Deep Learning
Pan Zhou
Jiashi Feng
Chao Ma
Caiming Xiong
Guosheng Lin
E. Weinan
101
235
0
12 Oct 2020
S-SGD: Symmetrical Stochastic Gradient Descent with Weight Noise Injection for Reaching Flat Minima
Wonyong Sung
Iksoo Choi
Jinhwan Park
Seokhyun Choi
Sungho Shin
ODL
58
7
0
05 Sep 2020
Obtaining Adjustable Regularization for Free via Iterate Averaging
Jingfeng Wu
Vladimir Braverman
Lin F. Yang
63
2
0
15 Aug 2020
Spherical Motion Dynamics: Learning Dynamics of Neural Network with Normalization, Weight Decay, and SGD
Ruosi Wan
Zhanxing Zhu
Xiangyu Zhang
Jian Sun
69
11
0
15 Jun 2020
The Heavy-Tail Phenomenon in SGD
Mert Gurbuzbalaban
Umut Simsekli
Lingjiong Zhu
47
130
0
08 Jun 2020
The Implicit Regularization of Stochastic Gradient Flow for Least Squares
Alnur Ali
Yan Sun
Robert Tibshirani
96
77
0
17 Mar 2020
The Implicit and Explicit Regularization Effects of Dropout
Colin Wei
Sham Kakade
Tengyu Ma
116
118
0
28 Feb 2020
The Break-Even Point on Optimization Trajectories of Deep Neural Networks
Stanislaw Jastrzebski
Maciej Szymczak
Stanislav Fort
Devansh Arpit
Jacek Tabor
Kyunghyun Cho
Krzysztof J. Geras
88
164
0
21 Feb 2020
On the Heavy-Tailed Theory of Stochastic Gradient Descent for Deep Neural Networks
Umut Simsekli
Mert Gurbuzbalaban
T. H. Nguyen
G. Richard
Levent Sagun
88
59
0
29 Nov 2019
Non-Gaussianity of Stochastic Gradient Noise
A. Panigrahi
Raghav Somani
Navin Goyal
Praneeth Netrapalli
68
53
0
21 Oct 2019
Deep Learning Theory Review: An Optimal Control and Dynamical Systems Perspective
Guan-Horng Liu
Evangelos A. Theodorou
AI4CE
118
72
0
28 Aug 2019
First Exit Time Analysis of Stochastic Gradient Descent Under Heavy-Tailed Gradient Noise
T. H. Nguyen
Umut Simsekli
Mert Gurbuzbalaban
G. Richard
79
65
0
21 Jun 2019
On the interplay between noise and curvature and its effect on optimization and generalization
Valentin Thomas
Fabian Pedregosa
B. V. Merrienboer
Pierre-Antoine Mangazol
Yoshua Bengio
Nicolas Le Roux
54
61
0
18 Jun 2019
Limitations of the Empirical Fisher Approximation for Natural Gradient Descent
Frederik Kunstner
Lukas Balles
Philipp Hennig
99
219
0
29 May 2019
An Empirical Study of Large-Batch Stochastic Gradient Descent with Structured Covariance Noise
Yeming Wen
Kevin Luk
Maxime Gazeau
Guodong Zhang
Harris Chan
Jimmy Ba
ODL
73
22
0
21 Feb 2019
A Tail-Index Analysis of Stochastic Gradient Noise in Deep Neural Networks
Umut Simsekli
Levent Sagun
Mert Gurbuzbalaban
111
252
0
18 Jan 2019
The capacity of feedforward neural networks
Pierre Baldi
Roman Vershynin
86
68
0
02 Jan 2019
On the Computational Inefficiency of Large Batch Sizes for Stochastic Gradient Descent
Noah Golmant
N. Vemuri
Z. Yao
Vladimir Feinberg
A. Gholami
Kai Rothauge
Michael W. Mahoney
Joseph E. Gonzalez
92
73
0
30 Nov 2018
Large batch size training of neural networks with adversarial training and second-order information
Z. Yao
A. Gholami
Daiyaan Arfeen
Richard Liaw
Joseph E. Gonzalez
Kurt Keutzer
Michael W. Mahoney
ODL
96
42
0
02 Oct 2018
Fluctuation-dissipation relations for stochastic gradient descent
Sho Yaida
113
75
0
28 Sep 2018
Don't Use Large Mini-Batches, Use Local SGD
Tao R. Lin
Sebastian U. Stich
Kumar Kshitij Patel
Martin Jaggi
123
432
0
22 Aug 2018
1