Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1609.04836
Cited By
v1
v2 (latest)
On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima
15 September 2016
N. Keskar
Dheevatsa Mudigere
J. Nocedal
M. Smelyanskiy
P. T. P. Tang
ODL
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima"
50 / 1,554 papers shown
Title
Implicit Regularization in Deep Learning
Behnam Neyshabur
66
148
0
06 Sep 2017
Unsupervised feature learning with discriminative encoder
Gaurav Pandey
Ambedkar Dukkipati
SSL
41
6
0
03 Sep 2017
Adversarial Networks for Spatial Context-Aware Spectral Image Reconstruction from RGB
Aitor Alvarez-Gila
Joost van de Weijer
Estíbaliz Garrote
GAN
81
92
0
01 Sep 2017
Super-Convergence: Very Fast Training of Neural Networks Using Large Learning Rates
L. Smith
Nicholay Topin
AI4CE
106
520
0
23 Aug 2017
Deep Learning at 15PF: Supervised and Semi-Supervised Classification for Scientific Data
Thorsten Kurth
Jian Zhang
N. Satish
Ioannis Mitliagkas
Evan Racah
...
J. Deslippe
Mikhail Shiryaev
Srinivas Sridharan
P. Prabhat
Pradeep Dubey
71
83
0
17 Aug 2017
Large Batch Training of Convolutional Networks
Yang You
Igor Gitman
Boris Ginsburg
ODL
163
853
0
13 Aug 2017
Scaling Deep Learning on GPU and Knights Landing clusters
Yang You
A. Buluç
J. Demmel
GNN
66
75
0
09 Aug 2017
Video Frame Interpolation via Adaptive Separable Convolution
Simon Niklaus
Long Mai
Feng Liu
113
701
0
05 Aug 2017
Reporting Score Distributions Makes a Difference: Performance Study of LSTM-networks for Sequence Tagging
Nils Reimers
Iryna Gurevych
87
437
0
31 Jul 2017
Analysis and Optimization of Convolutional Neural Network Architectures
Martin Thoma
99
73
0
31 Jul 2017
Mini-batch Tempered MCMC
Dangna Li
W. Wong
53
6
0
31 Jul 2017
A PAC-Bayesian Approach to Spectrally-Normalized Margin Bounds for Neural Networks
Behnam Neyshabur
Srinadh Bhojanapalli
Nathan Srebro
92
610
0
29 Jul 2017
A Robust Multi-Batch L-BFGS Method for Machine Learning
A. Berahas
Martin Takáč
AAML
ODL
111
44
0
26 Jul 2017
Tensor-Based Backpropagation in Neural Networks with Non-Sequential Input
Hirsh R. Agarwal
Andrew Huang
19
0
0
13 Jul 2017
Pedestrian Alignment Network for Large-scale Person Re-identification
Zhedong Zheng
Liang Zheng
Yi Yang
123
479
0
03 Jul 2017
Towards Understanding Generalization of Deep Learning: Perspective of Loss Landscapes
Lei Wu
Zhanxing Zhu
E. Weinan
ODL
69
221
0
30 Jun 2017
Exploring Generalization in Deep Learning
Behnam Neyshabur
Srinadh Bhojanapalli
David A. McAllester
Nathan Srebro
FAtt
167
1,260
0
27 Jun 2017
Efficiency of quantum versus classical annealing in non-convex learning problems
Carlo Baldassi
R. Zecchina
54
44
0
26 Jun 2017
Gradient Diversity: a Key Ingredient for Scalable Distributed Learning
Dong Yin
A. Pananjady
Max Lam
Dimitris Papailiopoulos
Kannan Ramchandran
Peter L. Bartlett
77
11
0
18 Jun 2017
A Closer Look at Memorization in Deep Networks
Devansh Arpit
Stanislaw Jastrzebski
Nicolas Ballas
David M. Krueger
Emmanuel Bengio
...
Tegan Maharaj
Asja Fischer
Aaron Courville
Yoshua Bengio
Simon Lacoste-Julien
TDI
144
1,830
0
16 Jun 2017
Empirical Analysis of the Hessian of Over-Parametrized Neural Networks
Levent Sagun
Utku Evci
V. U. Güney
Yann N. Dauphin
Léon Bottou
95
420
0
14 Jun 2017
Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour
Priya Goyal
Piotr Dollár
Ross B. Girshick
P. Noordhuis
Lukasz Wesolowski
Aapo Kyrola
Andrew Tulloch
Yangqing Jia
Kaiming He
3DH
132
3,688
0
08 Jun 2017
Characterizing Types of Convolution in Deep Convolutional Recurrent Neural Networks for Robust Speech Emotion Recognition
Che-Wei Huang
Shrikanth. S. Narayanan
HAI
50
25
0
07 Jun 2017
Deep Mutual Learning
Ying Zhang
Tao Xiang
Timothy M. Hospedales
Huchuan Lu
FedML
158
1,657
0
01 Jun 2017
Spectral Norm Regularization for Improving the Generalizability of Deep Learning
Yuichi Yoshida
Takeru Miyato
91
335
0
31 May 2017
Implicit Regularization in Matrix Factorization
Suriya Gunasekar
Blake E. Woodworth
Srinadh Bhojanapalli
Behnam Neyshabur
Nathan Srebro
89
493
0
25 May 2017
Train longer, generalize better: closing the generalization gap in large batch training of neural networks
Elad Hoffer
Itay Hubara
Daniel Soudry
ODL
194
803
0
24 May 2017
The Marginal Value of Adaptive Gradient Methods in Machine Learning
Ashia Wilson
Rebecca Roelofs
Mitchell Stern
Nathan Srebro
Benjamin Recht
ODL
96
1,034
0
23 May 2017
TernGrad: Ternary Gradients to Reduce Communication in Distributed Deep Learning
W. Wen
Cong Xu
Feng Yan
Chunpeng Wu
Yandan Wang
Yiran Chen
Hai Helen Li
207
990
0
22 May 2017
Dissecting Adam: The Sign, Magnitude and Variance of Stochastic Gradients
Lukas Balles
Philipp Hennig
100
169
0
22 May 2017
On the diffusion approximation of nonconvex stochastic gradient descent
Junyang Qian
C. J. Li
Lei Li
Jianguo Liu
DiffM
76
24
0
22 May 2017
Shake-Shake regularization
Xavier Gastaldi
3DPC
BDL
OOD
96
380
0
21 May 2017
Shallow Updates for Deep Reinforcement Learning
Nir Levine
Tom Zahavy
D. Mankowitz
Aviv Tamar
Shie Mannor
OffRL
72
48
0
21 May 2017
Practical Processing of Mobile Sensor Data for Continual Deep Learning Predictions
Kleomenis Katevas
Ilias Leontiadis
M. Pielot
Joan Serrà
HAI
51
12
0
17 May 2017
Stable Architectures for Deep Neural Networks
E. Haber
Lars Ruthotto
159
735
0
09 May 2017
Nonlinear Information Bottleneck
Artemy Kolchinsky
Brendan D. Tracey
David Wolpert
68
157
0
06 May 2017
Unsupervised prototype learning in an associative-memory network
Huiling Zhen
Shang-Nan Wang
Haijun Zhou
SSL
22
1
0
10 Apr 2017
Snapshot Ensembles: Train 1, get M for free
Gao Huang
Yixuan Li
Geoff Pleiss
Zhuang Liu
John E. Hopcroft
Kilian Q. Weinberger
OOD
FedML
UQCV
150
954
0
01 Apr 2017
Computing Nonvacuous Generalization Bounds for Deep (Stochastic) Neural Networks with Many More Parameters than Training Data
Gintare Karolina Dziugaite
Daniel M. Roy
126
820
0
31 Mar 2017
Sharp Minima Can Generalize For Deep Nets
Laurent Dinh
Razvan Pascanu
Samy Bengio
Yoshua Bengio
ODL
147
774
0
15 Mar 2017
Langevin Dynamics with Continuous Tempering for Training Deep Neural Networks
Nanyang Ye
Zhanxing Zhu
Rafał K. Mantiuk
92
21
0
13 Mar 2017
Data-Dependent Stability of Stochastic Gradient Descent
Ilja Kuzborskij
Christoph H. Lampert
MLT
142
166
0
05 Mar 2017
Training Language Models Using Target-Propagation
Sam Wiseman
S. Chopra
MarcÁurelio Ranzato
Arthur Szlam
Ruoyu Sun
Soumith Chintala
Nicolas Vasilache
59
8
0
15 Feb 2017
Incorporating Global Visual Features into Attention-Based Neural Machine Translation
Iacer Calixto
Qun Liu
Nick Campbell
136
156
0
23 Jan 2017
Tuning the Scheduling of Distributed Stochastic Gradient Descent with Bayesian Optimization
Valentin Dalibard
Michael Schaarschmidt
Eiko Yoneki
24
2
0
01 Dec 2016
Towards Robust Deep Neural Networks with BANG
Andras Rozsa
Manuel Günther
Terrance E. Boult
AAML
OOD
86
76
0
01 Dec 2016
GaDei: On Scale-up Training As A Service For Deep Learning
Wei Zhang
Minwei Feng
Yunhui Zheng
Yufei Ren
Yandong Wang
...
Peng Liu
Bing Xiang
Li Zhang
Bowen Zhou
Fei Wang
ALM
67
10
0
18 Nov 2016
Incremental Sequence Learning
E. Jong
CLL
33
5
0
09 Nov 2016
Entropy-SGD: Biasing Gradient Descent Into Wide Valleys
Pratik Chaudhari
A. Choromańska
Stefano Soatto
Yann LeCun
Carlo Baldassi
C. Borgs
J. Chayes
Levent Sagun
R. Zecchina
ODL
104
775
0
06 Nov 2016
Big Batch SGD: Automated Inference using Adaptive Batch Sizes
Soham De
A. Yadav
David Jacobs
Tom Goldstein
ODL
177
62
0
18 Oct 2016
Previous
1
2
3
...
30
31
32
Next