v1v2 (latest)

On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima

15 September 2016

Papers citing "On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima"

50 / 1,554 papers shown

Title
Extending the step-size restriction for gradient descent to avoid strict saddle points Hayden Schaeffer S. McCalla 107 4 0 05 Aug 2019
Calibrating the Adaptive Learning Rate to Improve Convergence of ADAM Qianqian Tong Guannan Liang J. Bi 105 7 0 02 Aug 2019
Optimizing Multi-GPU Parallelization Strategies for Deep Learning Training Saptadeep Pal Eiman Ebrahimi A. Zulfiqar Yaosheng Fu Victor Zhang Szymon Migacz D. Nellans Puneet Gupta 90 59 0 30 Jul 2019
Taming Momentum in a Distributed Asynchronous Environment Ido Hakimi Saar Barkai Moshe Gabel Assaf Schuster 93 23 0 26 Jul 2019
Hessian based analysis of SGD for Deep Nets: Dynamics and Generalization Xinyan Li Qilong Gu Yingxue Zhou Tiancong Chen A. Banerjee ODL 88 52 0 24 Jul 2019
BPPSA: Scaling Back-propagation by Parallel Scan Algorithm Shang Wang Yifan Bai Gennady Pekhimenko 60 7 0 23 Jul 2019
Spectral Analysis of Latent Representations Justin Shenk Mats L. Richter Anders Arpteg Mikael Huss FAtt 23 6 0 19 Jul 2019
Towards Understanding Generalization in Gradient-Based Meta-Learning Simon Guiroy Vikas Verma C. Pal 73 21 0 16 Jul 2019
Single-bit-per-weight deep convolutional neural networks without batch-normalization layers for embedded systems Mark D Mcdonnell Hesham Mostafa Runchun Wang Andre van Schaik MQ 44 2 0 16 Jul 2019
Learning Neural Networks with Adaptive Regularization Han Zhao Yao-Hung Hubert Tsai Ruslan Salakhutdinov Geoffrey J. Gordon 42 15 0 14 Jul 2019
Towards Explaining the Regularization Effect of Initial Large Learning Rate in Training Neural Networks Yuanzhi Li Colin Wei Tengyu Ma 90 299 0 10 Jul 2019
Etalumis: Bringing Probabilistic Programming to Scientific Simulators at Scale A. G. Baydin Lei Shao W. Bhimji Lukas Heinrich Lawrence Meadows ... Philip Torr Victor W. Lee Kyle Cranmer P. Prabhat Frank Wood 73 58 0 08 Jul 2019
Stochastic Gradient and Langevin Processes Xiang Cheng Dong Yin Peter L. Bartlett Michael I. Jordan 64 5 0 07 Jul 2019
Time-to-Event Prediction with Neural Networks and Cox Regression Håvard Kvamme Ørnulf Borgan Ida Scheel 383 337 0 01 Jul 2019
Deep Gamblers: Learning to Abstain with Portfolio Theory Liu Ziyin Zhikang T. Wang Paul Pu Liang Ruslan Salakhutdinov Louis-Philippe Morency Masahito Ueda 109 113 0 29 Jun 2019
On improving deep learning generalization with adaptive sparse connectivity Shiwei Liu Decebal Constantin Mocanu Mykola Pechenizkiy ODL 39 8 0 27 Jun 2019
Gradient Noise Convolution (GNC): Smoothing Loss Function for Distributed Large-Batch SGD Kosuke Haruki Taiji Suzuki Yohei Hamakawa Takeshi Toda Ryuji Sakai M. Ozawa Mitsuhiro Kimura ODL 61 17 0 26 Jun 2019
The Difficulty of Training Sparse Neural Networks Utku Evci Fabian Pedregosa Aidan Gomez Erich Elsen 72 101 0 25 Jun 2019
Is It Worth the Attention? A Comparative Evaluation of Attention Layers for Argument Unit Segmentation Maximilian Spliethover Jonas Klaff Hendrik Heuer 43 10 0 24 Jun 2019
First Exit Time Analysis of Stochastic Gradient Descent Under Heavy-Tailed Gradient Noise T. H. Nguyen Umut Simsekli Mert Gurbuzbalaban G. Richard 79 65 0 21 Jun 2019
On the interplay between noise and curvature and its effect on optimization and generalization Valentin Thomas Fabian Pedregosa B. V. Merrienboer Pierre-Antoine Mangazol Yoshua Bengio Nicolas Le Roux 52 61 0 18 Jun 2019
On the Noisy Gradient Descent that Generalizes as SGD Jingfeng Wu Wenqing Hu Haoyi Xiong Jun Huan Vladimir Braverman Zhanxing Zhu MLT 70 10 0 18 Jun 2019
A Survey of Optimization Methods from a Machine Learning Perspective Shiliang Sun Zehui Cao Han Zhu Jing Zhao 82 562 0 17 Jun 2019
Finding the Needle in the Haystack with Convolutions: on the benefits of architectural bias Stéphane dÁscoli Levent Sagun Joan Bruna Giulio Biroli 85 37 0 16 Jun 2019
Learning to Forget for Meta-Learning Sungyong Baik Seokil Hong Kyoung Mu Lee CLL KELM 75 89 0 13 Jun 2019
Generalization Guarantees for Neural Networks via Harnessing the Low-rank Structure of the Jacobian Samet Oymak Zalan Fabian Mingchen Li Mahdi Soltanolkotabi MLT 87 88 0 12 Jun 2019
Semi-flat minima and saddle points by embedding neural networks to overparameterization Kenji Fukumizu Shoichiro Yamaguchi Yoh-ichi Mototake Mirai Tanaka 3DPC 64 25 0 12 Jun 2019
Large Scale Structure of Neural Network Loss Landscapes Stanislav Fort Stanislaw Jastrzebski 72 84 0 11 Jun 2019
The Generalization-Stability Tradeoff In Neural Network Pruning Brian Bartoldson Ari S. Morcos Adrian Barbu G. Erlebacher 94 76 0 09 Jun 2019
The Implicit Bias of AdaGrad on Separable Data Qian Qian Xiaoyuan Qian 70 23 0 09 Jun 2019
Understanding Generalization through Visualizations Wenjie Huang Z. Emam Micah Goldblum Liam H. Fowl J. K. Terry Furong Huang Tom Goldstein AI4CE 51 80 0 07 Jun 2019
Inductive Bias of Gradient Descent based Adversarial Training on Separable Data Yan Li Ethan X. Fang Huan Xu T. Zhao 78 16 0 07 Jun 2019
The Normalization Method for Alleviating Pathological Sharpness in Wide Neural Networks Ryo Karakida S. Akaho S. Amari 73 41 0 07 Jun 2019
Fault Diagnosis of Rotary Machines using Deep Convolutional Neural Network with three axis signal input Davor Kolar D. Lisjak M. Pająk D. Pavković 24 0 0 06 Jun 2019
On the Convergence of SARAH and Beyond Bingcong Li Meng Ma G. Giannakis 68 27 0 05 Jun 2019
How to Initialize your Network? Robust Initialization for WeightNorm & ResNets Devansh Arpit Victor Campos Yoshua Bengio 75 59 0 05 Jun 2019
Deep Q-Learning for Directed Acyclic Graph Generation Laura DÁrcy P. Corcoran Alun D. Preece BDL GNN 26 5 0 05 Jun 2019
An Empirical Study on Hyperparameters and their Interdependence for RL Generalization Xingyou Song Yilun Du Jacob Jackson AI4CE 43 8 0 02 Jun 2019
Implicit Regularization in Deep Matrix Factorization Sanjeev Arora Nadav Cohen Wei Hu Yuping Luo AI4CE 111 509 0 31 May 2019
Luck Matters: Understanding Training Dynamics of Deep ReLU Networks Yuandong Tian Tina Jiang Qucheng Gong Ari S. Morcos 169 25 0 31 May 2019
Deterministic PAC-Bayesian generalization bounds for deep networks via generalizing noise-resilience Vaishnavh Nagarajan J. Zico Kolter 102 101 0 30 May 2019
Time Matters in Regularizing Deep Networks: Weight Decay and Data Augmentation Affect Early Learning Dynamics, Matter Little Near Convergence Aditya Golatkar Alessandro Achille Stefano Soatto 80 97 0 30 May 2019
Meta Dropout: Learning to Perturb Features for Generalization Haebeom Lee Taewook Nam Eunho Yang Sung Ju Hwang OOD 59 3 0 30 May 2019
Mixed Precision Training With 8-bit Floating Point Naveen Mellempudi Sudarshan Srinivasan Dipankar Das Bharat Kaul MQ 78 69 0 29 May 2019
Where is the Information in a Deep Neural Network? Alessandro Achille Giovanni Paolini Stefano Soatto 85 82 0 29 May 2019
High Frequency Component Helps Explain the Generalization of Convolutional Neural Networks Haohan Wang Xindi Wu Pengcheng Yin Eric Xing 77 526 0 28 May 2019
Gram-Gauss-Newton Method: Learning Overparameterized Neural Networks for Regression Problems Tianle Cai Ruiqi Gao Jikai Hou Siyu Chen Dong Wang Di He Zhihua Zhang Liwei Wang ODL 67 57 0 28 May 2019
SGD on Neural Networks Learns Functions of Increasing Complexity Preetum Nakkiran Gal Kaplun Dimitris Kalimeris Tristan Yang Benjamin L. Edelman Fred Zhang Boaz Barak MLT 140 248 0 28 May 2019
Quantifying the generalization error in deep learning in terms of data distribution and neural network smoothness Pengzhan Jin Lu Lu Yifa Tang George Karniadakis 65 60 0 27 May 2019
Nonparametric Online Learning Using Lipschitz Regularized Deep Neural Networks Guy Uziel BDL 44 0 0 26 May 2019