v1v2v3 (latest)

A Bayesian Perspective on Generalization and Stochastic Gradient Descent

17 October 2017

Papers citing "A Bayesian Perspective on Generalization and Stochastic Gradient Descent"

50 / 108 papers shown

Title
Finite Versus Infinite Neural Networks: an Empirical Study Jaehoon Lee S. Schoenholz Jeffrey Pennington Ben Adlam Lechao Xiao Roman Novak Jascha Narain Sohl-Dickstein 84 214 0 31 Jul 2020
Adaptive Periodic Averaging: A Practical Approach to Reducing Communication in Distributed Learning Peng Jiang G. Agrawal 47 5 0 13 Jul 2020
Learning Rate Annealing Can Provably Help Generalization, Even for Convex Problems Preetum Nakkiran MLT 64 21 0 15 May 2020
Pipelined Backpropagation at Scale: Training Large Models without Batches Atli Kosson Vitaliy Chiley Abhinav Venigalla Joel Hestness Urs Koster 107 33 0 25 Mar 2020
The large learning rate phase of deep learning: the catapult mechanism Aitor Lewkowycz Yasaman Bahri Ethan Dyer Jascha Narain Sohl-Dickstein Guy Gur-Ari ODL 215 241 0 04 Mar 2020
Rethinking Parameter Counting in Deep Models: Effective Dimensionality Revisited Wesley J. Maddox Gregory W. Benton A. Wilson 132 61 0 04 Mar 2020
The Implicit and Explicit Regularization Effects of Dropout Colin Wei Sham Kakade Tengyu Ma 116 118 0 28 Feb 2020
Batch Normalization Biases Residual Blocks Towards the Identity Function in Deep Networks Soham De Samuel L. Smith ODL 106 20 0 24 Feb 2020
The Two Regimes of Deep Network Training Guillaume Leclerc Aleksander Madry 94 45 0 24 Feb 2020
Rethinking the Hyperparameters for Fine-tuning Hao Li Pratik Chaudhari Hao Yang Michael Lam Avinash Ravichandran Rahul Bhotika Stefano Soatto VLM 93 130 0 19 Feb 2020
A Diffusion Theory For Deep Learning Dynamics: Stochastic Gradient Descent Exponentially Favors Flat Minima Zeke Xie Issei Sato Masashi Sugiyama ODL 127 17 0 10 Feb 2020
Optimized Generic Feature Learning for Few-shot Classification across Domains Tonmoy Saikia Thomas Brox Cordelia Schmid VLM 77 49 0 22 Jan 2020
'Place-cell' emergence and learning of invariant data with restricted Boltzmann machines: breaking and dynamical restoration of continuous symmetries in the weight space Moshir Harsh J. Tubiana Simona Cocco R. Monasson 49 15 0 30 Dec 2019
Linear Mode Connectivity and the Lottery Ticket Hypothesis Jonathan Frankle Gintare Karolina Dziugaite Daniel M. Roy Michael Carbin MoMe 181 630 0 11 Dec 2019
Fantastic Generalization Measures and Where to Find Them Yiding Jiang Behnam Neyshabur H. Mobahi Dilip Krishnan Samy Bengio AI4CE 148 611 0 04 Dec 2019
Orchestrating the Development Lifecycle of Machine Learning-Based IoT Applications: A Taxonomy and Survey Bin Qian Jie Su Z. Wen D. N. Jha Yinhao Li ... Albert Y. Zomaya Omer F. Rana Lizhe Wang Maciej Koutny R. Ranjan 56 4 0 11 Oct 2019
Beyond Human-Level Accuracy: Computational Challenges in Deep Learning Joel Hestness Newsha Ardalani G. Diamos 61 68 0 03 Sep 2019
Deep Learning Theory Review: An Optimal Control and Dynamical Systems Perspective Guan-Horng Liu Evangelos A. Theodorou AI4CE 118 72 0 28 Aug 2019
Towards Better Generalization: BP-SVRG in Training Deep Neural Networks Hao Jin Dachao Lin Zhihua Zhang ODL 35 2 0 18 Aug 2019
Towards Explaining the Regularization Effect of Initial Large Learning Rate in Training Neural Networks Yuanzhi Li Colin Wei Tengyu Ma 90 299 0 10 Jul 2019
How to Initialize your Network? Robust Initialization for WeightNorm & ResNets Devansh Arpit Victor Campos Yoshua Bengio 73 59 0 05 Jun 2019
Dimensionality compression and expansion in Deep Neural Networks Stefano Recanatesi M. Farrell Madhu S. Advani Timothy Moore Guillaume Lajoie E. Shea-Brown 77 74 0 02 Jun 2019
Are All Layers Created Equal? Chiyuan Zhang Samy Bengio Y. Singer 111 140 0 06 Feb 2019
Asymmetric Valleys: Beyond Sharp and Flat Local Minima Haowei He Gao Huang Yang Yuan ODL MLT 79 150 0 02 Feb 2019
An Empirical Model of Large-Batch Training Sam McCandlish Jared Kaplan Dario Amodei OpenAI Dota Team 76 280 0 14 Dec 2018
Parameter Re-Initialization through Cyclical Batch Size Schedules Norman Mu Z. Yao A. Gholami Kurt Keutzer Michael W. Mahoney ODL 70 8 0 04 Dec 2018
On the Computational Inefficiency of Large Batch Sizes for Stochastic Gradient Descent Noah Golmant N. Vemuri Z. Yao Vladimir Feinberg A. Gholami Kai Rothauge Michael W. Mahoney Joseph E. Gonzalez 92 73 0 30 Nov 2018
Massively Distributed SGD: ImageNet/ResNet-50 Training in a Flash Hiroaki Mikami Hisahiro Suganuma Pongsakorn U-chupala Yoshiki Tanaka Yuichi Kageyama 72 77 0 13 Nov 2018
Measuring the Effects of Data Parallelism on Neural Network Training Christopher J. Shallue Jaehoon Lee J. Antognini J. Mamou J. Ketterling Yao Wang 104 409 0 08 Nov 2018
Approximate Fisher Information Matrix to Characterise the Training of Deep Neural Networks Zhibin Liao Tom Drummond Ian Reid G. Carneiro 68 23 0 16 Oct 2018
Large batch size training of neural networks with adversarial training and second-order information Z. Yao A. Gholami Daiyaan Arfeen Richard Liaw Joseph E. Gonzalez Kurt Keutzer Michael W. Mahoney ODL 96 42 0 02 Oct 2018
Fluctuation-dissipation relations for stochastic gradient descent Sho Yaida 113 75 0 28 Sep 2018
Deep Bilevel Learning Simon Jenni Paolo Favaro NoLa 69 115 0 05 Sep 2018
Don't Use Large Mini-Batches, Use Local SGD Tao R. Lin Sebastian U. Stich Kumar Kshitij Patel Martin Jaggi 121 432 0 22 Aug 2018
TherML: Thermodynamics of Machine Learning Alexander A. Alemi Ian S. Fischer DRL AI4CE 58 29 0 11 Jul 2018
Stochastic natural gradient descent draws posterior samples in function space Samuel L. Smith Daniel Duckworth Semon Rezchikov Quoc V. Le Jascha Narain Sohl-Dickstein BDL 80 6 0 25 Jun 2018
PCA of high dimensional random walks with comparison to neural network training J. Antognini Jascha Narain Sohl-Dickstein OOD 62 29 0 22 Jun 2018
Universal Statistics of Fisher Information in Deep Neural Networks: Mean Field Approach Ryo Karakida S. Akaho S. Amari FedML 191 146 0 04 Jun 2018
Understanding Batch Normalization Johan Bjorck Carla P. Gomes B. Selman Kilian Q. Weinberger 177 617 0 01 Jun 2018
Amortized Inference Regularization Rui Shu Hung Bui Shengjia Zhao Mykel J. Kochenderfer Stefano Ermon DRL 57 82 0 23 May 2018
Deep learning generalizes because the parameter-function map is biased towards simple functions Guillermo Valle Pérez Chico Q. Camargo A. Louis MLT AI4CE 122 232 0 22 May 2018
SmoothOut: Smoothing Out Sharp Minima to Improve Generalization in Deep Learning W. Wen Yandan Wang Feng Yan Cong Xu Chunpeng Wu Yiran Chen H. Li 79 51 0 21 May 2018
DNN or k-NN: That is the Generalize vs. Memorize Question Gilad Cohen Guillermo Sapiro Raja Giryes 60 38 0 17 May 2018
Gaussian Process Behaviour in Wide Deep Neural Networks A. G. Matthews Mark Rowland Jiri Hron Richard Turner Zoubin Ghahramani BDL 177 561 0 30 Apr 2018
Revisiting Small Batch Training for Deep Neural Networks Dominic Masters Carlo Luschi ODL 80 669 0 20 Apr 2018
A Study on Overfitting in Deep Reinforcement Learning Chiyuan Zhang Oriol Vinyals Rémi Munos Samy Bengio OffRL OnRL 61 391 0 18 Apr 2018
Training Tips for the Transformer Model Martin Popel Ondrej Bojar 86 312 0 01 Apr 2018
A disciplined approach to neural network hyper-parameters: Part 1 -- learning rate, batch size, momentum, and weight decay L. Smith 295 1,036 0 26 Mar 2018
Gradient Descent Quantizes ReLU Network Features Hartmut Maennel Olivier Bousquet Sylvain Gelly MLT 74 82 0 22 Mar 2018
A Walk with SGD Chen Xing Devansh Arpit Christos Tsirigotis Yoshua Bengio 96 119 0 24 Feb 2018