Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1609.04836
Cited By
v1
v2 (latest)
On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima
15 September 2016
N. Keskar
Dheevatsa Mudigere
J. Nocedal
M. Smelyanskiy
P. T. P. Tang
ODL
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima"
50 / 1,554 papers shown
Title
Sample Complexity Bounds for Recurrent Neural Networks with Application to Combinatorial Graph Problems
Nil-Jana Akpinar
Bernhard Kratzwald
Stefan Feuerriegel
GNN
27
9
0
29 Jan 2019
An Investigation into Neural Net Optimization via Hessian Eigenvalue Density
Behrooz Ghorbani
Shankar Krishnan
Ying Xiao
ODL
101
326
0
29 Jan 2019
Variational Characterizations of Local Entropy and Heat Regularization in Deep Learning
Nicolas García Trillos
Zachary T. Kaplan
D. Sanz-Alonso
ODL
57
3
0
29 Jan 2019
Quasi-Newton Methods for Machine Learning: Forget the Past, Just Sample
A. Berahas
Majid Jahani
Peter Richtárik
Martin Takávc
102
41
0
28 Jan 2019
Augment your batch: better training with larger batches
Elad Hoffer
Tal Ben-Nun
Itay Hubara
Niv Giladi
Torsten Hoefler
Daniel Soudry
ODL
118
76
0
27 Jan 2019
Traditional and Heavy-Tailed Self Regularization in Neural Network Models
Charles H. Martin
Michael W. Mahoney
96
126
0
24 Jan 2019
Large-Batch Training for LSTM and Beyond
Yang You
Jonathan Hseu
Chris Ying
J. Demmel
Kurt Keutzer
Cho-Jui Hsieh
56
91
0
24 Jan 2019
Measurements of Three-Level Hierarchical Structure in the Outliers in the Spectrum of Deepnet Hessians
Vardan Papyan
82
88
0
24 Jan 2019
Decoupled Greedy Learning of CNNs
Eugene Belilovsky
Michael Eickenberg
Edouard Oyallon
77
117
0
23 Jan 2019
Visualized Insights into the Optimization Landscape of Fully Convolutional Networks
Jianjie Lu
K. Tong
100
12
0
20 Jan 2019
Frequency Principle: Fourier Analysis Sheds Light on Deep Neural Networks
Zhi-Qin John Xu
Yaoyu Zhang
Yaoyu Zhang
Yan Xiao
Zheng Ma
135
520
0
19 Jan 2019
Quasi-potential as an implicit regularizer for the loss function in the stochastic gradient descent
Wenqing Hu
Zhanxing Zhu
Haoyi Xiong
Jun Huan
MLT
51
10
0
18 Jan 2019
A Tail-Index Analysis of Stochastic Gradient Noise in Deep Neural Networks
Umut Simsekli
Levent Sagun
Mert Gurbuzbalaban
111
252
0
18 Jan 2019
Ensemble Feature for Person Re-Identification
Jiabao Wang
Yang Li
Zhuang Miao
OOD
3DPC
88
1
0
17 Jan 2019
Normalized Flat Minima: Exploring Scale Invariant Definition of Flat Minima for Neural Networks using PAC-Bayesian Analysis
Yusuke Tsuzuku
Issei Sato
Masashi Sugiyama
84
77
0
15 Jan 2019
Neumann Networks for Inverse Problems in Imaging
Davis Gilton
Greg Ongie
Rebecca Willett
78
24
0
13 Jan 2019
Visualising Basins of Attraction for the Cross-Entropy and the Squared Error Neural Network Loss Functions
Anna Sergeevna Bosman
A. Engelbrecht
Mardé Helbig
78
77
0
08 Jan 2019
CROSSBOW: Scaling Deep Learning with Small Batch Sizes on Multi-GPU Servers
A. Koliousis
Pijika Watcharapichat
Matthias Weidlich
Kai Zou
Paolo Costa
Peter R. Pietzuch
57
70
0
08 Jan 2019
Generalization in Deep Networks: The Role of Distance from Initialization
Vaishnavh Nagarajan
J. Zico Kolter
ODL
89
96
0
07 Jan 2019
Federated Learning via Over-the-Air Computation
Kai Yang
Tao Jiang
Yuanming Shi
Z. Ding
FedML
102
881
0
31 Dec 2018
A continuous-time analysis of distributed stochastic gradient
Nicholas M. Boffi
Jean-Jacques E. Slotine
46
15
0
28 Dec 2018
Improving Generalization of Deep Neural Networks by Leveraging Margin Distribution
Shen-Huan Lyu
Lu Wang
Zhi Zhou
34
11
0
27 Dec 2018
Overparameterized Nonlinear Learning: Gradient Descent Takes the Shortest Path?
Samet Oymak
Mahdi Soltanolkotabi
ODL
73
177
0
25 Dec 2018
Trust Region Based Adversarial Attack on Neural Networks
Z. Yao
A. Gholami
Peng Xu
Kurt Keutzer
Michael W. Mahoney
AAML
59
54
0
16 Dec 2018
An Empirical Model of Large-Batch Training
Sam McCandlish
Jared Kaplan
Dario Amodei
OpenAI Dota Team
76
280
0
14 Dec 2018
An Empirical Study of Example Forgetting during Deep Neural Network Learning
Mariya Toneva
Alessandro Sordoni
Rémi Tachet des Combes
Adam Trischler
Yoshua Bengio
Geoffrey J. Gordon
173
743
0
12 Dec 2018
Nonlinear Conjugate Gradients For Scaling Synchronous Distributed DNN Training
Saurabh N. Adya
Vinay Palakkode
Oncel Tuzel
39
4
0
07 Dec 2018
Towards Theoretical Understanding of Large Batch Training in Stochastic Gradient Descent
Xiaowu Dai
Yuhua Zhu
75
11
0
03 Dec 2018
Stochastic Training of Residual Networks: a Differential Equation Viewpoint
Qi Sun
Yunzhe Tao
Q. Du
71
24
0
01 Dec 2018
On the Computational Inefficiency of Large Batch Sizes for Stochastic Gradient Descent
Noah Golmant
N. Vemuri
Z. Yao
Vladimir Feinberg
A. Gholami
Kai Rothauge
Michael W. Mahoney
Joseph E. Gonzalez
92
73
0
30 Nov 2018
On Implicit Filter Level Sparsity in Convolutional Neural Networks
Dushyant Mehta
K. Kim
Christian Theobalt
84
28
0
29 Nov 2018
3D human pose estimation in video with temporal convolutions and semi-supervised training
Dario Pavllo
Christoph Feichtenhofer
David Grangier
Michael Auli
3DH
81
1,015
0
28 Nov 2018
Understanding the impact of entropy on policy optimization
Zafarali Ahmed
Nicolas Le Roux
Mohammad Norouzi
Dale Schuurmans
81
238
0
27 Nov 2018
Dense xUnit Networks
I. Kligvasser
T. Michaeli
92
3
0
27 Nov 2018
Forward Stability of ResNet and Its Variants
Linan Zhang
Hayden Schaeffer
111
48
0
24 Nov 2018
Self-Referenced Deep Learning
Xu Lan
Xiatian Zhu
S. Gong
123
24
0
19 Nov 2018
Generalizable Adversarial Training via Spectral Normalization
Farzan Farnia
Jesse M. Zhang
David Tse
OOD
AAML
83
140
0
19 Nov 2018
Image Classification at Supercomputer Scale
Chris Ying
Sameer Kumar
Dehao Chen
Tao Wang
Youlong Cheng
VLM
61
123
0
16 Nov 2018
Massively Distributed SGD: ImageNet/ResNet-50 Training in a Flash
Hiroaki Mikami
Hisahiro Suganuma
Pongsakorn U-chupala
Yoshiki Tanaka
Yuichi Kageyama
70
77
0
13 Nov 2018
Measuring the Effects of Data Parallelism on Neural Network Training
Christopher J. Shallue
Jaehoon Lee
J. Antognini
J. Mamou
J. Ketterling
Yao Wang
102
409
0
08 Nov 2018
Bias and Generalization in Deep Generative Models: An Empirical Study
Shengjia Zhao
Hongyu Ren
Arianna Yuan
Jiaming Song
Noah D. Goodman
Stefano Ermon
AI4CE
67
137
0
08 Nov 2018
Characterizing Well-Behaved vs. Pathological Deep Neural Networks
Mitchell Stern
34
0
0
07 Nov 2018
A Closer Look at Deep Policy Gradients
Andrew Ilyas
Logan Engstrom
Shibani Santurkar
Dimitris Tsipras
Firdaus Janoos
Larry Rudolph
Aleksander Madry
87
51
0
06 Nov 2018
Nonlinear Collaborative Scheme for Deep Neural Networks
Hui-Ling Zhen
Xi Lin
Alan Tang
Zhenhua Li
Qingfu Zhang
Sam Kwong
64
4
0
04 Nov 2018
Classification of Findings with Localized Lesions in Fundoscopic Images using a Regionally Guided CNN
Jaemin Son
Woong Bae
Sangkeun Kim
S. Park
Kyu-Hwan Jung
20
17
0
02 Nov 2018
Online Embedding Compression for Text Classification using Low Rank Matrix Factorization
Anish Acharya
Rahul Goel
A. Metallinou
Inderjit Dhillon
94
62
0
01 Nov 2018
Multi-Label Robust Factorization Autoencoder and its Application in Predicting Drug-Drug Interactions
Xu Chu
Yang Lin
Jingyue Gao
Jiangtao Wang
Yasha Wang
Leye Wang
OOD
20
4
0
01 Nov 2018
Democratizing Production-Scale Distributed Deep Learning
Minghuang Ma
Hadi Pouransari
Daniel Chao
Saurabh N. Adya
S. Serrano
Yi Qin
Dan Gimnicher
Dominic Walsh
MoE
96
6
0
31 Oct 2018
A Closer Look at Deep Learning Heuristics: Learning rate restarts, Warmup and Distillation
Akhilesh Deepak Gotmare
N. Keskar
Caiming Xiong
R. Socher
ODL
97
277
0
29 Oct 2018
Three Mechanisms of Weight Decay Regularization
Guodong Zhang
Chaoqi Wang
Bowen Xu
Roger C. Grosse
75
259
0
29 Oct 2018
Previous
1
2
3
...
26
27
28
...
30
31
32
Next