Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1710.11029
Cited By
v1
v2 (latest)
Stochastic gradient descent performs variational inference, converges to limit cycles for deep networks
30 October 2017
Pratik Chaudhari
Stefano Soatto
MLT
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Stochastic gradient descent performs variational inference, converges to limit cycles for deep networks"
50 / 112 papers shown
Title
Implicit Gradient Regularization
David Barrett
Benoit Dherin
98
152
0
23 Sep 2020
Predicting Training Time Without Training
Luca Zancato
Alessandro Achille
Avinash Ravichandran
Rahul Bhotika
Stefano Soatto
151
24
0
28 Aug 2020
Efficient hyperparameter optimization by way of PAC-Bayes bound minimization
John J. Cherian
Andrew G. Taube
R. McGibbon
Panagiotis Angelikopoulos
Guy Blanc
M. Snarski
D. D. Richman
J. L. Klepeis
D. Shaw
25
6
0
14 Aug 2020
Partial local entropy and anisotropy in deep weight spaces
Daniele Musso
30
3
0
17 Jul 2020
On the Generalization Benefit of Noise in Stochastic Gradient Descent
Samuel L. Smith
Erich Elsen
Soham De
MLT
62
100
0
26 Jun 2020
Neural Anisotropy Directions
Guillermo Ortiz-Jiménez
Apostolos Modas
Seyed-Mohsen Moosavi-Dezfooli
P. Frossard
94
16
0
17 Jun 2020
Hausdorff Dimension, Heavy Tails, and Generalization in Neural Networks
Umut Simsekli
Ozan Sener
George Deligiannidis
Murat A. Erdogdu
86
56
0
16 Jun 2020
Shape Matters: Understanding the Implicit Bias of the Noise Covariance
Jeff Z. HaoChen
Colin Wei
Jason D. Lee
Tengyu Ma
210
95
0
15 Jun 2020
The Heavy-Tail Phenomenon in SGD
Mert Gurbuzbalaban
Umut Simsekli
Lingjiong Zhu
45
130
0
08 Jun 2020
Bayesian Neural Network via Stochastic Gradient Descent
Abhinav Sagar
UQCV
BDL
47
2
0
04 Jun 2020
On Learning Rates and Schrödinger Operators
Bin Shi
Weijie J. Su
Michael I. Jordan
90
60
0
15 Apr 2020
Robust and On-the-fly Dataset Denoising for Image Classification
Jiaming Song
Lunjia Hu
Michael Auli
Yann N. Dauphin
Tengyu Ma
NoLa
OOD
87
13
0
24 Mar 2020
Stochastic gradient descent with random learning rate
Daniele Musso
ODL
40
4
0
15 Mar 2020
The Implicit and Explicit Regularization Effects of Dropout
Colin Wei
Sham Kakade
Tengyu Ma
116
118
0
28 Feb 2020
Tensor Decompositions in Deep Learning
D. Bacciu
Danilo P. Mandic
46
14
0
26 Feb 2020
The Early Phase of Neural Network Training
Jonathan Frankle
D. Schwab
Ari S. Morcos
94
174
0
24 Feb 2020
Fractional Underdamped Langevin Dynamics: Retargeting SGD with Momentum under Heavy-Tailed Gradient Noise
Umut Simsekli
Lingjiong Zhu
Yee Whye Teh
Mert Gurbuzbalaban
80
50
0
13 Feb 2020
Fantastic Generalization Measures and Where to Find Them
Yiding Jiang
Behnam Neyshabur
H. Mobahi
Dilip Krishnan
Samy Bengio
AI4CE
148
611
0
04 Dec 2019
On the Heavy-Tailed Theory of Stochastic Gradient Descent for Deep Neural Networks
Umut Simsekli
Mert Gurbuzbalaban
T. H. Nguyen
G. Richard
Levent Sagun
88
59
0
29 Nov 2019
Bayesian interpretation of SGD as Ito process
Soma Yokoi
Issei Sato
30
5
0
20 Nov 2019
E2-Train: Training State-of-the-art CNNs with Over 80% Energy Savings
Yue Wang
Ziyu Jiang
Xiaohan Chen
Pengfei Xu
Yang Zhao
Yingyan Lin
Zhangyang Wang
MQ
107
83
0
29 Oct 2019
A Simple Dynamic Learning Rate Tuning Algorithm For Automated Training of DNNs
Koyel Mukherjee
Alind Khare
Ashish Verma
74
15
0
25 Oct 2019
A generalization of regularized dual averaging and its dynamics
Shih-Kang Chao
Guang Cheng
51
18
0
22 Sep 2019
Deep Learning Theory Review: An Optimal Control and Dynamical Systems Perspective
Guan-Horng Liu
Evangelos A. Theodorou
AI4CE
118
72
0
28 Aug 2019
A Probabilistic Representation of Deep Learning
Xinjie Lan
Kenneth Barner
UQCV
BDL
AI4CE
110
1
0
26 Aug 2019
Hessian based analysis of SGD for Deep Nets: Dynamics and Generalization
Xinyan Li
Qilong Gu
Yingxue Zhou
Tiancong Chen
A. Banerjee
ODL
88
52
0
24 Jul 2019
Neural ODEs as the Deep Limit of ResNets with constant weights
B. Avelin
K. Nystrom
ODL
141
32
0
28 Jun 2019
First Exit Time Analysis of Stochastic Gradient Descent Under Heavy-Tailed Gradient Noise
T. H. Nguyen
Umut Simsekli
Mert Gurbuzbalaban
G. Richard
79
65
0
21 Jun 2019
On the interplay between noise and curvature and its effect on optimization and generalization
Valentin Thomas
Fabian Pedregosa
B. V. Merrienboer
Pierre-Antoine Mangazol
Yoshua Bengio
Nicolas Le Roux
52
61
0
18 Jun 2019
On the Noisy Gradient Descent that Generalizes as SGD
Jingfeng Wu
Wenqing Hu
Haoyi Xiong
Jun Huan
Vladimir Braverman
Zhanxing Zhu
MLT
70
10
0
18 Jun 2019
Replica-exchange Nosé-Hoover dynamics for Bayesian learning on large datasets
Rui Luo
Qiang Zhang
Yaodong Yang
Jun Wang
BDL
71
3
0
29 May 2019
The Effect of Network Width on Stochastic Gradient Descent and Generalization: an Empirical Study
Daniel S. Park
Jascha Narain Sohl-Dickstein
Quoc V. Le
Samuel L. Smith
96
57
0
09 May 2019
Meta-learners' learning dynamics are unlike learners'
Neil C. Rabinowitz
OffRL
88
16
0
03 May 2019
Implicit regularization for deep neural networks driven by an Ornstein-Uhlenbeck like process
Guy Blanc
Neha Gupta
Gregory Valiant
Paul Valiant
167
147
0
19 Apr 2019
The Information Complexity of Learning Tasks, their Structure and their Distance
Alessandro Achille
Giovanni Paolini
G. Mbeng
Stefano Soatto
57
52
0
05 Apr 2019
Implicit Regularization in Over-parameterized Neural Networks
M. Kubo
Ryotaro Banno
Hidetaka Manabe
Masataka Minoji
76
23
0
05 Mar 2019
An Empirical Study of Large-Batch Stochastic Gradient Descent with Structured Covariance Noise
Yeming Wen
Kevin Luk
Maxime Gazeau
Guodong Zhang
Harris Chan
Jimmy Ba
ODL
71
22
0
21 Feb 2019
A Simple Baseline for Bayesian Uncertainty in Deep Learning
Wesley J. Maddox
T. Garipov
Pavel Izmailov
Dmitry Vetrov
A. Wilson
BDL
UQCV
117
810
0
07 Feb 2019
Quasi-potential as an implicit regularizer for the loss function in the stochastic gradient descent
Wenqing Hu
Zhanxing Zhu
Haoyi Xiong
Jun Huan
MLT
51
10
0
18 Jan 2019
A Tail-Index Analysis of Stochastic Gradient Noise in Deep Neural Networks
Umut Simsekli
Levent Sagun
Mert Gurbuzbalaban
111
252
0
18 Jan 2019
Towards Theoretical Understanding of Large Batch Training in Stochastic Gradient Descent
Xiaowu Dai
Yuhua Zhu
75
11
0
03 Dec 2018
On the Computational Inefficiency of Large Batch Sizes for Stochastic Gradient Descent
Noah Golmant
N. Vemuri
Z. Yao
Vladimir Feinberg
A. Gholami
Kai Rothauge
Michael W. Mahoney
Joseph E. Gonzalez
92
73
0
30 Nov 2018
Deep learning for pedestrians: backpropagation in CNNs
L. Boué
3DV
PINN
39
4
0
29 Nov 2018
Deep Frank-Wolfe For Neural Network Optimization
Leonard Berrada
Andrew Zisserman
M. P. Kumar
ODL
64
40
0
19 Nov 2018
Fluctuation-dissipation relations for stochastic gradient descent
Sho Yaida
113
75
0
28 Sep 2018
Don't Use Large Mini-Batches, Use Local SGD
Tao R. Lin
Sebastian U. Stich
Kumar Kshitij Patel
Martin Jaggi
121
432
0
22 Aug 2018
Ensemble Kalman Inversion: A Derivative-Free Technique For Machine Learning Tasks
Nikola B. Kovachki
Andrew M. Stuart
BDL
102
138
0
10 Aug 2018
Conditional Prior Networks for Optical Flow
Yanchao Yang
Stefano Soatto
3DPC
85
37
0
26 Jul 2018
On the Relation Between the Sharpest Directions of DNN Loss and the SGD Step Length
Stanislaw Jastrzebski
Zachary Kenton
Nicolas Ballas
Asja Fischer
Yoshua Bengio
Amos Storkey
ODL
71
118
0
13 Jul 2018
TherML: Thermodynamics of Machine Learning
Alexander A. Alemi
Ian S. Fischer
DRL
AI4CE
58
29
0
11 Jul 2018
Previous
1
2
3
Next