v1v2 (latest)

On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima

15 September 2016

Papers citing "On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima"

50 / 1,554 papers shown

Title
The Dynamics of Learning: A Random Matrix Approach Zhenyu Liao Romain Couillet AI4CE 69 43 0 30 May 2018
How Does Batch Normalization Help Optimization? Shibani Santurkar Dimitris Tsipras Andrew Ilyas Aleksander Madry ODL 136 1,548 0 29 May 2018
Distilling Knowledge for Search-based Structured Prediction Yijia Liu Wanxiang Che Huaipeng Zhao Bing Qin Ting Liu 51 22 0 29 May 2018
Investigating Label Noise Sensitivity of Convolutional Neural Networks for Fine Grained Audio Signal Labelling Rainer Kelz Gerhard Widmer NoLa 26 4 0 28 May 2018
A Double-Deep Spatio-Angular Learning Framework for Light Field based Face Recognition Alireza Sepas-Moghaddam M. A. Haque P. Correia Kamal Nasrollahi T. Moeslund F. Pereira CVBM 51 36 0 25 May 2018
Local SGD Converges Fast and Communicates Little Sebastian U. Stich FedML 205 1,071 0 24 May 2018
Input and Weight Space Smoothing for Semi-supervised Learning Safa Cicek Stefano Soatto 47 6 0 23 May 2018
Deep learning generalizes because the parameter-function map is biased towards simple functions Guillermo Valle Pérez Chico Q. Camargo A. Louis MLT AI4CE 122 232 0 22 May 2018
Gradient Energy Matching for Distributed Asynchronous Gradient Descent Joeri Hermans Gilles Louppe 40 5 0 22 May 2018
Stochastic modified equations for the asynchronous stochastic gradient descent Jing An Jian-wei Lu Lexing Ying 77 79 0 21 May 2018
Never look back - A modified EnKF method and its application to the training of neural networks without back propagation E. Haber F. Lucka Lars Ruthotto 62 32 0 21 May 2018
SmoothOut: Smoothing Out Sharp Minima to Improve Generalization in Deep Learning W. Wen Yandan Wang Feng Yan Cong Xu Chunpeng Wu Yiran Chen H. Li 79 51 0 21 May 2018
Parameter Hub: a Rack-Scale Parameter Server for Distributed Deep Neural Network Training Liang Luo Jacob Nelson Luis Ceze Amar Phanishayee Arvind Krishnamurthy 154 121 0 21 May 2018
DNN or k-NN: That is the Generalize vs. Memorize Question Gilad Cohen Guillermo Sapiro Raja Giryes 60 38 0 17 May 2018
Unifying Data, Model and Hybrid Parallelism in Deep Learning via Tensor Tiling Minjie Wang Chien-chin Huang Jinyang Li FedML 56 25 0 10 May 2018
On Visual Hallmarks of Robustness to Adversarial Malware Alex Huang Abdullah Al-Dujaili Erik Hemberg Una-May O’Reilly AAML 69 7 0 09 May 2018
SaaS: Speed as a Supervisor for Semi-supervised Learning Safa Cicek Alhussein Fawzi Stefano Soatto BDL 85 19 0 02 May 2018
SHADE: Information Based Regularization for Deep Learning Michael Blot Thomas Robert Nicolas Thome Matthieu Cord 62 12 0 29 Apr 2018
HG-means: A scalable hybrid genetic algorithm for minimum sum-of-squares clustering Daniel Gribel Thibaut Vidal 37 41 0 25 Apr 2018
Path Planning in Support of Smart Mobility Applications using Generative Adversarial Networks M. Mohammadi Ala I. Al-Fuqaha Jun-Seok Oh GAN 69 24 0 23 Apr 2018
Revisiting Small Batch Training for Deep Neural Networks Dominic Masters Carlo Luschi ODL 80 669 0 20 Apr 2018
Non-Vacuous Generalization Bounds at the ImageNet Scale: A PAC-Bayesian Compression Approach Wenda Zhou Victor Veitch Morgane Austern Ryan P. Adams Peter Orbanz 75 215 0 16 Apr 2018
DeepFM: An End-to-End Wide & Deep Learning Framework for CTR Prediction Huifeng Guo Ruiming Tang Yunming Ye Zhenguo Li Xiuqiang He Zhenhua Dong 158 64 0 12 Apr 2018
Large scale distributed neural network training through online distillation Rohan Anil Gabriel Pereyra Alexandre Passos Róbert Ormándi George E. Dahl Geoffrey E. Hinton FedML 336 408 0 09 Apr 2018
The Loss Surface of XOR Artificial Neural Networks D. Mehta Xiaojun Zhao Edgar A. Bernal D. Wales 156 19 0 06 Apr 2018
Training Tips for the Transformer Model Martin Popel Ondrej Bojar 86 312 0 01 Apr 2018
Online Second Order Methods for Non-Convex Stochastic Optimizations Xi-Lin Li OffRL ODL 41 4 0 26 Mar 2018
On the Local Minima of the Empirical Risk Chi Jin Lydia T. Liu Rong Ge Michael I. Jordan FedML 147 56 0 25 Mar 2018
Multiple Sclerosis Lesion Segmentation from Brain MRI via Fully Convolutional Neural Networks Snehashis Roy J. Butman Daniel Reich P. Calabresi Dzung L. Pham MedIm 50 86 0 24 Mar 2018
A high-bias, low-variance introduction to Machine Learning for physicists Pankaj Mehta Marin Bukov Ching-Hao Wang A. G. Day C. Richardson Charles K. Fisher D. Schwab AI4CE 119 880 0 23 Mar 2018
Gradient Descent Quantizes ReLU Network Features Hartmut Maennel Olivier Bousquet Sylvain Gelly MLT 66 82 0 22 Mar 2018
Learning Eligibility in Cancer Clinical Trials using Deep Neural Networks A. Bustos A. Pertusa 25 28 0 22 Mar 2018
Assessing Shape Bias Property of Convolutional Neural Networks Hossein Hosseini Baicen Xiao Mayoore S. Jaiswal Radha Poovendran 63 36 0 21 Mar 2018
Comparing Dynamics: Deep Neural Networks versus Glassy Systems Marco Baity-Jesi Levent Sagun Mario Geiger S. Spigler Gerard Ben Arous C. Cammarota Yann LeCun Matthieu Wyart Giulio Biroli AI4CE 112 115 0 19 Mar 2018
On the importance of single directions for generalization Ari S. Morcos David Barrett Neil C. Rabinowitz M. Botvinick 97 333 0 19 Mar 2018
On the insufficiency of existing momentum schemes for Stochastic Optimization Rahul Kidambi Praneeth Netrapalli Prateek Jain Sham Kakade ODL 90 120 0 15 Mar 2018
Averaging Weights Leads to Wider Optima and Better Generalization Pavel Izmailov Dmitrii Podoprikhin T. Garipov Dmitry Vetrov A. Wilson FedML MoMe 149 1,673 0 14 Mar 2018
TicTac: Accelerating Distributed Deep Learning with Communication Scheduling Sayed Hadi Hashemi Sangeetha Abdu Jyothi R. Campbell 51 199 0 08 Mar 2018
Essentially No Barriers in Neural Network Energy Landscape Felix Dräxler K. Veschgini M. Salmhofer Fred Hamprecht MoMe 136 436 0 02 Mar 2018
The Anisotropic Noise in Stochastic Gradient Descent: Its Behavior of Escaping from Sharp Minima and Regularization Effects Zhanxing Zhu Jingfeng Wu Ting Yu Lei Wu Jin Ma 65 40 0 01 Mar 2018
Neural Inverse Rendering for General Reflectance Photometric Stereo Tatsunori Taniai Takanori Maehara 130 105 0 28 Feb 2018
Semi-Supervised Learning Enabled by Multiscale Deep Neural Network Inversion Randall Balestriero H. Glotin Richard Baraniuk BDL 110 5 0 27 Feb 2018
Loss Surfaces, Mode Connectivity, and Fast Ensembling of DNNs T. Garipov Pavel Izmailov Dmitrii Podoprikhin Dmitry Vetrov A. Wilson UQCV 114 758 0 27 Feb 2018
Solving Inverse Computational Imaging Problems using Deep Pixel-level Prior Akshat Dave Anil Kumar Vadathya R. Subramanyam R. Baburajan Kaushik Mitra 61 22 0 27 Feb 2018
A Walk with SGD Chen Xing Devansh Arpit Christos Tsirigotis Yoshua Bengio 96 119 0 24 Feb 2018
Sensitivity and Generalization in Neural Networks: an Empirical Study Roman Novak Yasaman Bahri Daniel A. Abolafia Jeffrey Pennington Jascha Narain Sohl-Dickstein AAML 99 442 0 23 Feb 2018
Characterizing Implicit Bias in Terms of Optimization Geometry Suriya Gunasekar Jason D. Lee Daniel Soudry Nathan Srebro AI4CE 90 413 0 22 Feb 2018
Hessian-based Analysis of Large Batch Training and Robustness to Adversaries Z. Yao A. Gholami Qi Lei Kurt Keutzer Michael W. Mahoney 88 167 0 22 Feb 2018
The Secret Sharer: Evaluating and Testing Unintended Memorization in Neural Networks Nicholas Carlini Chang-rui Liu Ulfar Erlingsson Jernej Kos Basel Alomair 177 1,151 0 22 Feb 2018
Improved Techniques For Weakly-Supervised Object Localization Junsuk Choe J. Park Hyunjung Shim WSOL 62 7 0 22 Feb 2018