v1v2 (latest)

On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima

15 September 2016

Papers citing "On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima"

50 / 1,554 papers shown

Title
An Alternative View: When Does SGD Escape Local Minima? Robert D. Kleinberg Yuanzhi Li Yang Yuan MLT 91 317 0 17 Feb 2018
Model compression via distillation and quantization A. Polino Razvan Pascanu Dan Alistarh MQ 88 733 0 15 Feb 2018
A Progressive Batching L-BFGS Method for Machine Learning Raghu Bollapragada Dheevatsa Mudigere J. Nocedal Hao-Jun Michael Shi P. T. P. Tang ODL 109 153 0 15 Feb 2018
Input-Aware Auto-Tuning of Compute-Bound HPC Kernels Philippe Tillet David D. Cox 48 36 0 15 Feb 2018
Stronger generalization bounds for deep nets via a compression approach Sanjeev Arora Rong Ge Behnam Neyshabur Yi Zhang MLT AI4CE 122 643 0 14 Feb 2018
A Diffusion Approximation Theory of Momentum SGD in Nonconvex Optimization Tianyi Liu Zhehui Chen Enlu Zhou T. Zhao 87 14 0 14 Feb 2018
A Simple Proximal Stochastic Gradient Method for Nonsmooth Nonconvex Optimization Zhize Li Jian Li 97 116 0 13 Feb 2018
Classification of Things in DBpedia using Deep Neural Networks Rahul Parundekar 40 2 0 07 Feb 2018
Parameter Box: High Performance Parameter Servers for Efficient Distributed Deep Neural Network Training Liangchen Luo Jacob Nelson Luis Ceze Amar Phanishayee Arvind Krishnamurthy 52 1 0 30 Jan 2018
On Scale-out Deep Learning Training for Cloud and HPC Srinivas Sridharan K. Vaidyanathan Dhiraj D. Kalamkar Dipankar Das Mikhail E. Smorkalov ... Dheevatsa Mudigere Naveen Mellempudi Sasikanth Avancha Bharat Kaul Pradeep Dubey BDL 62 30 0 24 Jan 2018
Multi-pseudo Regularized Label for Generated Data in Person Re-Identification Y. Huang Jingsong Xu Qiang Wu Zhedong Zheng Zhaoxiang Zhang Jian Zhang GAN 121 114 0 21 Jan 2018
MXNET-MPI: Embedding MPI parallelism in Parameter Server Task Model for scaling Deep Learning Amith R. Mamidala Georgios Kollias C. Ward F. Artico 75 20 0 11 Jan 2018
Theory of Deep Learning IIb: Optimization Properties of SGD Chiyuan Zhang Q. Liao Alexander Rakhlin Brando Miranda Noah Golowich T. Poggio ODL 75 71 0 07 Jan 2018
The Multilinear Structure of ReLU Networks T. Laurent J. V. Brecht 92 51 0 29 Dec 2017
Visualizing the Loss Landscape of Neural Nets Hao Li Zheng Xu Gavin Taylor Christoph Studer Tom Goldstein 272 1,901 0 28 Dec 2017
Algorithmic Regularization in Over-parameterized Matrix Sensing and Neural Networks with Quadratic Activations Yuanzhi Li Tengyu Ma Hongyang R. Zhang 74 31 0 26 Dec 2017
Block-diagonal Hessian-free Optimization for Training Neural Networks Huishuai Zhang Caiming Xiong James Bradbury R. Socher ODL 52 22 0 20 Dec 2017
Continual Prediction of Notification Attendance with Classical and Deep Network Approaches Kleomenis Katevas Ilias Leontiadis M. Pielot Joan Serrà 19 2 0 19 Dec 2017
Parallel Complexity of Forward and Backward Propagation Maxim Naumov 42 8 0 18 Dec 2017
The exploding gradient problem demystified - definition, prevalence, impact, origin, tradeoffs, and solutions George Philipp Basel Alomair J. Carbonell ODL 92 46 0 15 Dec 2017
Integrated Model, Batch and Domain Parallelism in Training Neural Networks A. Gholami A. Azad Peter H. Jin Kurt Keutzer A. Buluç 81 84 0 12 Dec 2017
Neumann Optimizer: A Practical Optimization Algorithm for Deep Neural Networks Shankar Krishnan Ying Xiao Rif A. Saurous ODL 45 20 0 08 Dec 2017
AdaBatch: Adaptive Batch Sizes for Training Deep Neural Networks Aditya Devarakonda Maxim Naumov M. Garland ODL 107 136 0 06 Dec 2017
Implicit Regularization in Nonconvex Statistical Estimation: Gradient Descent Converges Linearly for Phase Retrieval, Matrix Completion, and Blind Deconvolution Cong Ma Kaizheng Wang Yuejie Chi Yuxin Chen 125 241 0 28 Nov 2017
Asymptotic Analysis via Stochastic Differential Equations of Gradient Descent Algorithms in Statistical and Computational Paradigms Yazhen Wang 54 17 0 27 Nov 2017
Critical Learning Periods in Deep Neural Networks Alessandro Achille Matteo Rovere Stefano Soatto 72 100 0 24 Nov 2017
Deep supervised learning using local errors Hesham Mostafa V. Ramesh Gert Cauwenberghs 68 115 0 17 Nov 2017
A Resizable Mini-batch Gradient Descent based on a Multi-Armed Bandit S. Cho Sunghun Kang Chang D. Yoo 79 1 0 17 Nov 2017
Decoupled Weight Decay Regularization I. Loshchilov Frank Hutter OffRL 158 2,161 0 14 Nov 2017
Three Factors Influencing Minima in SGD Stanislaw Jastrzebski Zachary Kenton Devansh Arpit Nicolas Ballas Asja Fischer Yoshua Bengio Amos Storkey 85 463 0 13 Nov 2017
Scale out for large minibatch SGD: Residual network training on ImageNet-1K with improved accuracy and reduced time to train V. Codreanu Damian Podareanu V. Saletore 63 55 0 12 Nov 2017
Meta-Learning by Adjusting Priors Based on Extended PAC-Bayes Theory Ron Amit Ron Meir BDL MLT 73 176 0 03 Nov 2017
Efficient Training of Convolutional Neural Nets on Large Distributed Systems Sameer Kumar D. Sreedhar Vaibhav Saxena Yogish Sabharwal Ashish Verma 58 4 0 02 Nov 2017
Don't Decay the Learning Rate, Increase the Batch Size Samuel L. Smith Pieter-Jan Kindermans Chris Ying Quoc V. Le ODL 127 996 0 01 Nov 2017
Deep Learning as a Mixed Convex-Combinatorial Optimization Problem A. Friesen Pedro M. Domingos 46 20 0 31 Oct 2017
Stochastic gradient descent performs variational inference, converges to limit cycles for deep networks Pratik Chaudhari Stefano Soatto MLT 88 304 0 30 Oct 2017
The Implicit Bias of Gradient Descent on Separable Data Daniel Soudry Elad Hoffer Mor Shpigel Nacson Suriya Gunasekar Nathan Srebro 208 924 0 27 Oct 2017
Rethinking generalization requires revisiting old ideas: statistical mechanics approaches and complex learning behavior Charles H. Martin Michael W. Mahoney AI4CE 74 64 0 26 Oct 2017
Stability and Generalization of Learning Algorithms that Converge to Global Optima Zachary B. Charles Dimitris Papailiopoulos MLT 57 163 0 23 Oct 2017
Function Norms and Regularization in Deep Networks Amal Rannen Triki Maxim Berman Matthew B. Blaschko 45 2 0 18 Oct 2017
A Bayesian Perspective on Generalization and Stochastic Gradient Descent Samuel L. Smith Quoc V. Le BDL 104 253 0 17 Oct 2017
Searching for Activation Functions Prajit Ramachandran Barret Zoph Quoc V. Le 97 612 0 16 Oct 2017
Generalization in Deep Learning Kenji Kawaguchi L. Kaelbling Yoshua Bengio ODL 164 459 0 16 Oct 2017
AdaDNNs: Adaptive Ensemble of Deep Neural Networks for Scene Text Recognition Chun Yang Xu-Cheng Yin Zejun Li Jianwei Wu Chunchao Guo Hongfa Wang Lei Xiao 41 10 0 10 Oct 2017
SGD for robot motion? The effectiveness of stochastic optimization on a new benchmark for biped locomotion tasks Martim Brandao K. Hashimoto A. Takanishi 55 6 0 09 Oct 2017
Neural Optimizer Search with Reinforcement Learning Irwan Bello Barret Zoph Vijay Vasudevan Quoc V. Le ODL 88 386 0 21 Sep 2017
ImageNet Training in Minutes Yang You Zhao-jie Zhang Cho-Jui Hsieh J. Demmel Kurt Keutzer VLM LRM 132 57 0 14 Sep 2017
The Impact of Local Geometry and Batch Size on Stochastic Gradient Descent for Nonconvex Problems V. Patel MLT 61 8 0 14 Sep 2017
Normalized Direction-preserving Adam Zijun Zhang Lin Ma Zongpeng Li Chuan Wu ODL 78 29 0 13 Sep 2017
Parallelizing Linear Recurrent Neural Nets Over Sequence Length Eric Martin Chris Cundy 116 104 0 12 Sep 2017