v1v2 (latest)

On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima

15 September 2016

Papers citing "On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima"

50 / 1,554 papers shown

Title
Accurate, Efficient and Scalable Graph Embedding Hanqing Zeng Hongkuan Zhou Ajitesh Srivastava Rajgopal Kannan Viktor Prasanna GNN 95 74 0 28 Oct 2018
Can We Gain More from Orthogonality Regularizations in Training Deep CNNs? Nitin Bansal Xiaohan Chen Zhangyang Wang OOD 96 188 0 22 Oct 2018
A Modern Take on the Bias-Variance Tradeoff in Neural Networks Brady Neal Sarthak Mittal A. Baratin Vinayak Tantia Matthew Scicluna Simon Lacoste-Julien Ioannis Mitliagkas 87 167 0 19 Oct 2018
Sequenced-Replacement Sampling for Deep Learning C. Ho Dae Hoon Park Wei Yang Yi Chang 35 0 0 19 Oct 2018
The loss surface of deep linear networks viewed through the algebraic geometry lens D. Mehta Tianran Chen Tingting Tang J. Hauenstein ODL 92 32 0 17 Oct 2018
Approximate Fisher Information Matrix to Characterise the Training of Deep Neural Networks Zhibin Liao Tom Drummond Ian Reid G. Carneiro 68 23 0 16 Oct 2018
Detecting Memorization in ReLU Networks Edo Collins Siavash Bigdeli Sabine Süsstrunk 73 4 0 08 Oct 2018
Toward Understanding the Impact of Staleness in Distributed Machine Learning Wei-Ming Dai Yi Zhou Nanqing Dong Huatian Zhang Eric Xing 67 81 0 08 Oct 2018
Implicit Self-Regularization in Deep Neural Networks: Evidence from Random Matrix Theory and Implications for Learning Charles H. Martin Michael W. Mahoney AI4CE 134 201 0 02 Oct 2018
Large batch size training of neural networks with adversarial training and second-order information Z. Yao A. Gholami Daiyaan Arfeen Richard Liaw Joseph E. Gonzalez Kurt Keutzer Michael W. Mahoney ODL 96 42 0 02 Oct 2018
Directional Analysis of Stochastic Gradient Descent via von Mises-Fisher Distributions in Deep learning Cheolhyoung Lee Kyunghyun Cho Wanmo Kang 56 8 0 29 Sep 2018
Interpreting Adversarial Robustness: A View from Decision Surface in Input Space Fuxun Yu Chenchen Liu Yanzhi Wang Liang Zhao Xiang Chen AAML OOD 87 27 0 29 Sep 2018
GPyTorch: Blackbox Matrix-Matrix Gaussian Process Inference with GPU Acceleration Jacob R. Gardner Geoff Pleiss D. Bindel Kilian Q. Weinberger A. Wilson GP 149 1,105 0 28 Sep 2018
A theoretical framework for deep locally connected ReLU network Yuandong Tian PINN 61 10 0 28 Sep 2018
Deep Confidence: A Computationally Efficient Framework for Calculating Reliable Errors for Deep Neural Networks I. Cortés-Ciriano A. Bender OOD UQCV 67 61 0 24 Sep 2018
Identifying Generalization Properties in Neural Networks Huan Wang N. Keskar Caiming Xiong R. Socher 66 50 0 19 Sep 2018
Efficient and Robust Parallel DNN Training through Model Parallelism on Multi-GPU Platform Chi-Chung Chen Chia-Lin Yang Hsiang-Yun Cheng 91 101 0 08 Sep 2018
Accelerating Asynchronous Stochastic Gradient Descent for Neural Machine Translation Nikolay Bogoychev Marcin Junczys-Dowmunt Kenneth Heafield Alham Fikri Aji ODL 49 17 0 27 Aug 2018
Don't Use Large Mini-Batches, Use Local SGD Tao R. Lin Sebastian U. Stich Kumar Kshitij Patel Martin Jaggi 121 432 0 22 Aug 2018
Understanding training and generalization in deep learning by Fourier analysis Zhi-Qin John Xu AI4CE 93 94 0 13 Aug 2018
Fast Variance Reduction Method with Stochastic Batch Size Xuanqing Liu Cho-Jui Hsieh 91 5 0 07 Aug 2018
Learning Overparameterized Neural Networks via Stochastic Gradient Descent on Structured Data Yuanzhi Li Yingyu Liang MLT 226 653 0 03 Aug 2018
Generalization Error in Deep Learning Daniel Jakubovitz Raja Giryes M. Rodrigues AI4CE 226 111 0 03 Aug 2018
Highly Scalable Deep Learning Training System with Mixed-Precision: Training ImageNet in Four Minutes Xianyan Jia Shutao Song W. He Yangzihao Wang Haidong Rong ... Li Yu Tiegang Chen Guangxiao Hu Shaoshuai Shi Xiaowen Chu 110 384 0 30 Jul 2018
Learning Representations for Soft Skill Matching L. Sayfullina Eric Malmi Arno Solin 40 31 0 20 Jul 2018
On the Relation Between the Sharpest Directions of DNN Loss and the SGD Step Length Stanislaw Jastrzebski Zachary Kenton Nicolas Ballas Asja Fischer Yoshua Bengio Amos Storkey ODL 71 118 0 13 Jul 2018
Efficient Decentralized Deep Learning by Dynamic Model Averaging Michael Kamp Linara Adilova Joachim Sicking Fabian Hüger Peter Schlicht Tim Wirtz Stefan Wrobel 88 129 0 09 Jul 2018
The Goldilocks zone: Towards better understanding of neural network loss landscapes Stanislav Fort Adam Scherlis 71 50 0 06 Jul 2018
Fuzzy Logic Interpretation of Quadratic Networks Fenglei Fan Ge Wang 62 7 0 04 Jul 2018
Optimization of neural networks via finite-value quantum fluctuations Masayuki Ohzeki Shuntaro Okada Masayoshi Terabe S. Taguchi 48 21 0 01 Jul 2018
Graph-to-Sequence Learning using Gated Graph Neural Networks Daniel Beck Gholamreza Haffari Trevor Cohn GNN 79 326 0 26 Jun 2018
Stochastic natural gradient descent draws posterior samples in function space Samuel L. Smith Daniel Duckworth Semon Rezchikov Quoc V. Le Jascha Narain Sohl-Dickstein BDL 69 6 0 25 Jun 2018
Pushing the boundaries of parallel Deep Learning -- A practical approach Paolo Viviani M. Drocco Marco Aldinucci OOD 38 0 0 25 Jun 2018
Character-Level Feature Extraction with Densely Connected Networks Chanhee Lee Young-Bum Kim Dongyub Lee Heuiseok Lim 3DV 41 12 0 24 Jun 2018
PCA of high dimensional random walks with comparison to neural network training J. Antognini Jascha Narain Sohl-Dickstein OOD 62 29 0 22 Jun 2018
On the Spectral Bias of Neural Networks Nasim Rahaman A. Baratin Devansh Arpit Felix Dräxler Min Lin Fred Hamprecht Yoshua Bengio Aaron Courville 170 1,460 0 22 Jun 2018
Faster SGD training by minibatch persistency M. Fischetti Iacopo Mandatelli Domenico Salvagnin 41 5 0 19 Jun 2018
Using Mode Connectivity for Loss Landscape Analysis Akhilesh Deepak Gotmare N. Keskar Caiming Xiong R. Socher 71 28 0 18 Jun 2018
Laplacian Smoothing Gradient Descent Stanley Osher Bao Wang Penghang Yin Xiyang Luo Farzin Barekat Minh Pham A. Lin ODL 113 43 0 17 Jun 2018
There Are Many Consistent Explanations of Unlabeled Data: Why You Should Average Ben Athiwaratkun Marc Finzi Pavel Izmailov A. Wilson 281 244 0 14 Jun 2018
Knowledge Distillation by On-the-Fly Native Ensemble Xu Lan Xiatian Zhu S. Gong 298 481 0 12 Jun 2018
The Effect of Network Width on the Performance of Large-batch Training Lingjiao Chen Hongyi Wang Jinman Zhao Dimitris Papailiopoulos Paraschos Koutris 87 22 0 11 Jun 2018
Towards Binary-Valued Gates for Robust LSTM Training Zhuohan Li Di He Fei Tian Wei-neng Chen Tao Qin Liwei Wang Tie-Yan Liu MQ 59 47 0 08 Jun 2018
Training Faster by Separating Modes of Variation in Batch-normalized Models Mahdi M. Kalayeh M. Shah 70 42 0 07 Jun 2018
Implicit regularization and solution uniqueness in over-parameterized matrix sensing Kelly Geyer Anastasios Kyrillidis A. Kalev 106 4 0 06 Jun 2018
Layer rotation: a surprisingly powerful indicator of generalization in deep networks? Simon Carbonnelle Christophe De Vleeschouwer MLT 70 1 0 05 Jun 2018
Backdrop: Stochastic Backpropagation Siavash Golkar Kyle Cranmer 45 2 0 04 Jun 2018
Universal Statistics of Fisher Information in Deep Neural Networks: Mean Field Approach Ryo Karakida S. Akaho S. Amari FedML 191 146 0 04 Jun 2018
Implicit Bias of Gradient Descent on Linear Convolutional Networks Suriya Gunasekar Jason D. Lee Daniel Soudry Nathan Srebro MDE 133 414 0 01 Jun 2018
Understanding Batch Normalization Johan Bjorck Carla P. Gomes B. Selman Kilian Q. Weinberger 177 617 0 01 Jun 2018