A Bayesian Perspective on Generalization and Stochastic Gradient Descent

v1v2v3 (latest)

A Bayesian Perspective on Generalization and Stochastic Gradient Descent

17 October 2017

Samuel L. Smith

ArXiv (abs)PDF HTML

Papers citing "A Bayesian Perspective on Generalization and Stochastic Gradient Descent"

8 / 108 papers shown

Title
Hessian-based Analysis of Large Batch Training and Robustness to Adversaries Z. Yao A. Gholami Qi Lei Kurt Keutzer Michael W. Mahoney 96 167 0 22 Feb 2018
An Alternative View: When Does SGD Escape Local Minima? Robert D. Kleinberg Yuanzhi Li Yang Yuan MLT 93 317 0 17 Feb 2018
signSGD: Compressed Optimisation for Non-Convex Problems Jeremy Bernstein Yu Wang Kamyar Azizzadenesheli Anima Anandkumar FedML ODL 124 1,050 0 13 Feb 2018
Deep Learning Scaling is Predictable, Empirically Joel Hestness Sharan Narang Newsha Ardalani G. Diamos Heewoo Jun Hassan Kianinejad Md. Mostofa Ali Patwary Yang Yang Yanqi Zhou 114 744 0 01 Dec 2017
Three Factors Influencing Minima in SGD Stanislaw Jastrzebski Zachary Kenton Devansh Arpit Nicolas Ballas Asja Fischer Yoshua Bengio Amos Storkey 85 463 0 13 Nov 2017
Don't Decay the Learning Rate, Increase the Batch Size Samuel L. Smith Pieter-Jan Kindermans Chris Ying Quoc V. Le ODL 127 996 0 01 Nov 2017
Normalized Direction-preserving Adam Zijun Zhang Lin Ma Zongpeng Li Chuan Wu ODL 80 29 0 13 Sep 2017
Super-Convergence: Very Fast Training of Neural Networks Using Large Learning Rates L. Smith Nicholay Topin AI4CE 106 520 0 23 Aug 2017