Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1710.06451
Cited By
v1
v2
v3 (latest)
A Bayesian Perspective on Generalization and Stochastic Gradient Descent
17 October 2017
Samuel L. Smith
Quoc V. Le
BDL
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"A Bayesian Perspective on Generalization and Stochastic Gradient Descent"
8 / 108 papers shown
Title
Hessian-based Analysis of Large Batch Training and Robustness to Adversaries
Z. Yao
A. Gholami
Qi Lei
Kurt Keutzer
Michael W. Mahoney
96
167
0
22 Feb 2018
An Alternative View: When Does SGD Escape Local Minima?
Robert D. Kleinberg
Yuanzhi Li
Yang Yuan
MLT
93
317
0
17 Feb 2018
signSGD: Compressed Optimisation for Non-Convex Problems
Jeremy Bernstein
Yu Wang
Kamyar Azizzadenesheli
Anima Anandkumar
FedML
ODL
124
1,050
0
13 Feb 2018
Deep Learning Scaling is Predictable, Empirically
Joel Hestness
Sharan Narang
Newsha Ardalani
G. Diamos
Heewoo Jun
Hassan Kianinejad
Md. Mostofa Ali Patwary
Yang Yang
Yanqi Zhou
114
744
0
01 Dec 2017
Three Factors Influencing Minima in SGD
Stanislaw Jastrzebski
Zachary Kenton
Devansh Arpit
Nicolas Ballas
Asja Fischer
Yoshua Bengio
Amos Storkey
85
463
0
13 Nov 2017
Don't Decay the Learning Rate, Increase the Batch Size
Samuel L. Smith
Pieter-Jan Kindermans
Chris Ying
Quoc V. Le
ODL
127
996
0
01 Nov 2017
Normalized Direction-preserving Adam
Zijun Zhang
Lin Ma
Zongpeng Li
Chuan Wu
ODL
80
29
0
13 Sep 2017
Super-Convergence: Very Fast Training of Neural Networks Using Large Learning Rates
L. Smith
Nicholay Topin
AI4CE
106
520
0
23 Aug 2017
Previous
1
2
3