v1v2 (latest)

On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima

15 September 2016

Papers citing "On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima"

50 / 1,554 papers shown

Title
Leader Stochastic Gradient Descent for Distributed Training of Deep Learning Models: Extension Yunfei Teng Wenbo Gao F. Chalus A. Choromańska Shiqian Ma Adrian Weller 134 12 0 24 May 2019
Loss Surface Modality of Feed-Forward Neural Network Architectures Anna Sergeevna Bosman A. Engelbrecht Mardé Helbig 43 9 0 24 May 2019
Explicitizing an Implicit Bias of the Frequency Principle in Two-layer Neural Networks Yaoyu Zhang Zhi-Qin John Xu Yaoyu Zhang Zheng Ma MLT AI4CE 130 38 0 24 May 2019
The role of invariance in spectral complexity-based generalization bounds Konstantinos Pitas Andreas Loukas Mike Davies P. Vandergheynst BDL 16 1 0 23 May 2019
Improving Neural Networks by Adopting Amplifying and Attenuating Neurons Seongmun Jung O. Kwon 16 0 0 23 May 2019
Shaping the learning landscape in neural networks around wide flat minima Carlo Baldassi Fabrizio Pittorino R. Zecchina MLT 75 84 0 20 May 2019
Lexicographic and Depth-Sensitive Margins in Homogeneous and Non-Homogeneous Deep Models Mor Shpigel Nacson Suriya Gunasekar Jason D. Lee Nathan Srebro Daniel Soudry 92 94 0 17 May 2019
Be Your Own Teacher: Improve the Performance of Convolutional Neural Networks via Self Distillation Linfeng Zhang Jiebo Song Anni Gao Jingwei Chen Chenglong Bao Kaisheng Ma FedML 85 865 0 17 May 2019
Orthogonal Deep Neural Networks Kui Jia Shuai Li Yuxin Wen Tongliang Liu Dacheng Tao 93 134 0 15 May 2019
Scaling Distributed Training of Flood-Filling Networks on HPC Infrastructure for Brain Mapping Wu Dong Murat Keçeli Rafael Vescovi Hanyu Li Corey Adams ... T. Uram V. Vishwanath N. Ferrier B. Kasthuri P. Littlewood FedML AI4CE 40 9 0 13 May 2019
Interpreting and Evaluating Neural Network Robustness Fuxun Yu Zhuwei Qin Chenchen Liu Liang Zhao Yanzhi Wang Xiang Chen AAML 54 56 0 10 May 2019
The sharp, the flat and the shallow: Can weakly interacting agents learn to escape bad minima? N. Kantas P. Parpas G. Pavliotis ODL 30 8 0 10 May 2019
The Effect of Network Width on Stochastic Gradient Descent and Generalization: an Empirical Study Daniel S. Park Jascha Narain Sohl-Dickstein Quoc V. Le Samuel L. Smith 96 57 0 09 May 2019
Data-dependent Sample Complexity of Deep Neural Networks via Lipschitz Augmentation Colin Wei Tengyu Ma 85 110 0 09 May 2019
Full-Gradient Representation for Neural Network Visualization Suraj Srinivas François Fleuret MILM FAtt 95 276 0 02 May 2019
SWALP : Stochastic Weight Averaging in Low-Precision Training Guandao Yang Tianyi Zhang Polina Kirichenko Junwen Bai A. Wilson Christopher De Sa 85 97 0 26 Apr 2019
Improved visible to IR image transformation using synthetic data augmentation with cycle-consistent adversarial networks Kyongsik Yun Kevin Yu Joseph Osborne S. Eldin Luan Nguyen Alexander Huyen Thomas Lu GAN 29 19 0 25 Apr 2019
Communication trade-offs for synchronized distributed SGD with large step size Kumar Kshitij Patel Aymeric Dieuleveut FedML 61 27 0 25 Apr 2019
HARK Side of Deep Learning -- From Grad Student Descent to Automated Machine Learning O. Gencoglu M. Gils E. Guldogan Chamin Morikawa Mehmet Süzen M. Gruber J. Leinonen H. Huttunen 98 36 0 16 Apr 2019
MxML: Mixture of Meta-Learners for Few-Shot Classification Minseop Park Jungtaek Kim Saehoon Kim Yanbin Liu Seungjin Choi OODD 33 8 0 11 Apr 2019
A Comparative Analysis of the Optimization and Generalization Property of Two-layer Neural Network and Random Feature Models Under Gradient Descent Dynamics E. Weinan Chao Ma Lei Wu MLT 77 124 0 08 Apr 2019
Information Bottleneck and its Applications in Deep Learning Hassan Hafez-Kolahi S. Kasaei 53 19 0 07 Apr 2019
Parallelizable Stack Long Short-Term Memory Shuoyang Ding Philipp Koehn 51 3 0 06 Apr 2019
DeLTA: GPU Performance Model for Deep Learning Applications with In-depth Memory System Traffic Analysis Sangkug Lym Donghyuk Lee Mike O'Connor Niladrish Chatterjee M. Erez 78 37 0 02 Apr 2019
Lautum Regularization for Semi-supervised Transfer Learning Daniel Jakubovitz M. Rodrigues Raja Giryes 67 4 0 02 Apr 2019
Why ResNet Works? Residuals Generalize Fengxiang He Tongliang Liu Dacheng Tao 65 253 0 02 Apr 2019
Optimal Obfuscation Mechanisms via Machine Learning Marco Romanelli K. Chatzikokolakis C. Palamidessi AAML 53 12 0 01 Apr 2019
Large Batch Optimization for Deep Learning: Training BERT in 76 minutes Yang You Jing Li Sashank J. Reddi Jonathan Hseu Sanjiv Kumar Srinadh Bhojanapalli Xiaodan Song J. Demmel Kurt Keutzer Cho-Jui Hsieh ODL 292 1,000 0 01 Apr 2019
Gradient Descent with Early Stopping is Provably Robust to Label Noise for Overparameterized Neural Networks Mingchen Li Mahdi Soltanolkotabi Samet Oymak NoLa 129 355 0 27 Mar 2019
Improving Strong-Scaling of CNN Training by Exploiting Finer-Grained Parallelism Nikoli Dryden N. Maruyama Tom Benson Tim Moon M. Snir B. Van Essen 69 49 0 15 Mar 2019
Inefficiency of K-FAC for Large Batch Size Training Linjian Ma Gabe Montague Jiayu Ye Z. Yao A. Gholami Kurt Keutzer Michael W. Mahoney 49 24 0 14 Mar 2019
Communication-efficient distributed SGD with Sketching Nikita Ivkin D. Rothchild Enayat Ullah Vladimir Braverman Ion Stoica R. Arora FedML 69 200 0 12 Mar 2019
SLIDE : In Defense of Smart Algorithms over Hardware Acceleration for Large-Scale Deep Learning Systems Beidi Chen Tharun Medini James Farwell Sameh Gobriel Charlie Tai Anshumali Shrivastava 85 105 0 07 Mar 2019
Positively Scale-Invariant Flatness of ReLU Neural Networks Mingyang Yi Qi Meng Wei-neng Chen Zhi-Ming Ma Tie-Yan Liu 76 18 0 06 Mar 2019
Implicit Regularization in Over-parameterized Neural Networks M. Kubo Ryotaro Banno Hidetaka Manabe Masataka Minoji 76 23 0 05 Mar 2019
Deep Learning Based Motion Planning For Autonomous Vehicle Using Spatiotemporal LSTM Network Zhengwei Bai B. Cai Shangguan Wei Linguo Chai 31 27 0 05 Mar 2019
Multilingual Neural Machine Translation with Knowledge Distillation Xu Tan Yi Ren Di He Tao Qin Zhou Zhao Tie-Yan Liu 112 250 0 27 Feb 2019
An Empirical Study of Large-Batch Stochastic Gradient Descent with Structured Covariance Noise Yeming Wen Kevin Luk Maxime Gazeau Guodong Zhang Harris Chan Jimmy Ba ODL 71 22 0 21 Feb 2019
A Little Is Enough: Circumventing Defenses For Distributed Learning Moran Baruch Gilad Baruch Yoav Goldberg FedML 65 514 0 16 Feb 2019
Parameter Efficient Training of Deep Convolutional Neural Networks by Dynamic Sparse Reparameterization Hesham Mostafa Xin Wang 114 315 0 15 Feb 2019
Training on the Edge: The why and the how Navjot Kukreja Alena Shilova Olivier Beaumont Jan Huckelheim N. Ferrier P. Hovland Gerard Gorman 49 36 0 13 Feb 2019
Uniform convergence may be unable to explain generalization in deep learning Vaishnavh Nagarajan J. Zico Kolter MoMe AI4CE 98 317 0 13 Feb 2019
Towards moderate overparameterization: global convergence guarantees for training shallow neural networks Samet Oymak Mahdi Soltanolkotabi 63 323 0 12 Feb 2019
Cyclical Stochastic Gradient MCMC for Bayesian Deep Learning Ruqi Zhang Chunyuan Li Jianyi Zhang Changyou Chen A. Wilson BDL 88 278 0 11 Feb 2019
A Simple Baseline for Bayesian Uncertainty in Deep Learning Wesley J. Maddox T. Garipov Pavel Izmailov Dmitry Vetrov A. Wilson BDL UQCV 117 810 0 07 Feb 2019
A Scale Invariant Flatness Measure for Deep Network Minima Akshay Rangamani Nam H. Nguyen Abhishek Kumar Dzung Phan Sang H. Chin T. Tran ODL 88 31 0 06 Feb 2019
Are All Layers Created Equal? Chiyuan Zhang Samy Bengio Y. Singer 111 140 0 06 Feb 2019
Distribution-Dependent Analysis of Gibbs-ERM Principle Ilja Kuzborskij Nicolò Cesa-Bianchi Csaba Szepesvári 74 20 0 05 Feb 2019
Asymmetric Valleys: Beyond Sharp and Flat Local Minima Haowei He Gao Huang Yang Yuan ODL MLT 79 150 0 02 Feb 2019
Episodic Training for Domain Generalization Da Li Jianshu Zhang Yongxin Yang Cong Liu Yi-Zhe Song Timothy M. Hospedales OOD 144 450 0 31 Jan 2019