Papers
Communities
Organizations
Events
Blog
Pricing
Feedback
Contact Sales
Search
Open menu
Home
Papers
1609.04836
Cited By
v1
v2 (latest)
On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima
15 September 2016
N. Keskar
Dheevatsa Mudigere
J. Nocedal
M. Smelyanskiy
P. T. P. Tang
ODL
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima"
50 / 1,585 papers shown
Title
The exploding gradient problem demystified - definition, prevalence, impact, origin, tradeoffs, and solutions
George Philipp
Basel Alomair
J. Carbonell
ODL
136
48
0
15 Dec 2017
Integrated Model, Batch and Domain Parallelism in Training Neural Networks
A. Gholami
A. Azad
Peter H. Jin
Kurt Keutzer
A. Buluç
127
85
0
12 Dec 2017
Neumann Optimizer: A Practical Optimization Algorithm for Deep Neural Networks
Shankar Krishnan
Ying Xiao
Rif A. Saurous
ODL
70
20
0
08 Dec 2017
AdaBatch: Adaptive Batch Sizes for Training Deep Neural Networks
Aditya Devarakonda
Maxim Naumov
M. Garland
ODL
167
137
0
06 Dec 2017
Implicit Regularization in Nonconvex Statistical Estimation: Gradient Descent Converges Linearly for Phase Retrieval, Matrix Completion, and Blind Deconvolution
Cong Ma
Kaizheng Wang
Yuejie Chi
Yuxin Chen
218
249
0
28 Nov 2017
Asymptotic Analysis via Stochastic Differential Equations of Gradient Descent Algorithms in Statistical and Computational Paradigms
Yazhen Wang
127
17
0
27 Nov 2017
Critical Learning Periods in Deep Neural Networks
Alessandro Achille
Matteo Rovere
Stefano Soatto
114
104
0
24 Nov 2017
Deep supervised learning using local errors
Hesham Mostafa
V. Ramesh
Gert Cauwenberghs
115
123
0
17 Nov 2017
A Resizable Mini-batch Gradient Descent based on a Multi-Armed Bandit
S. Cho
Sunghun Kang
Chang D. Yoo
102
1
0
17 Nov 2017
Decoupled Weight Decay Regularization
I. Loshchilov
Frank Hutter
OffRL
239
2,257
0
14 Nov 2017
Three Factors Influencing Minima in SGD
Stanislaw Jastrzebski
Zachary Kenton
Devansh Arpit
Nicolas Ballas
Asja Fischer
Yoshua Bengio
Amos Storkey
245
474
0
13 Nov 2017
Scale out for large minibatch SGD: Residual network training on ImageNet-1K with improved accuracy and reduced time to train
V. Codreanu
Damian Podareanu
V. Saletore
88
55
0
12 Nov 2017
Meta-Learning by Adjusting Priors Based on Extended PAC-Bayes Theory
Ron Amit
Ron Meir
BDL
MLT
226
179
0
03 Nov 2017
Efficient Training of Convolutional Neural Nets on Large Distributed Systems
Sameer Kumar
D. Sreedhar
Vaibhav Saxena
Yogish Sabharwal
Ashish Verma
67
5
0
02 Nov 2017
Don't Decay the Learning Rate, Increase the Batch Size
Samuel L. Smith
Pieter-Jan Kindermans
Chris Ying
Quoc V. Le
ODL
404
1,026
0
01 Nov 2017
Deep Learning as a Mixed Convex-Combinatorial Optimization Problem
A. Friesen
Pedro M. Domingos
65
20
0
31 Oct 2017
Stochastic gradient descent performs variational inference, converges to limit cycles for deep networks
Pratik Chaudhari
Stefano Soatto
MLT
156
306
0
30 Oct 2017
The Implicit Bias of Gradient Descent on Separable Data
Daniel Soudry
Elad Hoffer
Mor Shpigel Nacson
Suriya Gunasekar
Nathan Srebro
566
947
0
27 Oct 2017
Rethinking generalization requires revisiting old ideas: statistical mechanics approaches and complex learning behavior
Charles H. Martin
Michael W. Mahoney
AI4CE
97
64
0
26 Oct 2017
Stability and Generalization of Learning Algorithms that Converge to Global Optima
Zachary B. Charles
Dimitris Papailiopoulos
MLT
126
169
0
23 Oct 2017
Function Norms and Regularization in Deep Networks
Amal Rannen Triki
Maxim Berman
Matthew B. Blaschko
100
2
0
18 Oct 2017
A Bayesian Perspective on Generalization and Stochastic Gradient Descent
Samuel L. Smith
Quoc V. Le
BDL
225
262
0
17 Oct 2017
Searching for Activation Functions
Prajit Ramachandran
Barret Zoph
Quoc V. Le
148
624
0
16 Oct 2017
Generalization in Deep Learning
Kenji Kawaguchi
L. Kaelbling
Yoshua Bengio
ODL
419
466
0
16 Oct 2017
AdaDNNs: Adaptive Ensemble of Deep Neural Networks for Scene Text Recognition
Chun Yang
Xu-Cheng Yin
Zejun Li
Jianwei Wu
Chunchao Guo
Hongfa Wang
Lei Xiao
66
10
0
10 Oct 2017
SGD for robot motion? The effectiveness of stochastic optimization on a new benchmark for biped locomotion tasks
Martim Brandao
K. Hashimoto
A. Takanishi
78
6
0
09 Oct 2017
Neural Optimizer Search with Reinforcement Learning
Irwan Bello
Barret Zoph
Vijay Vasudevan
Quoc V. Le
ODL
150
390
0
21 Sep 2017
ImageNet Training in Minutes
Yang You
Zhao-jie Zhang
Cho-Jui Hsieh
J. Demmel
Kurt Keutzer
VLM
LRM
214
57
0
14 Sep 2017
The Impact of Local Geometry and Batch Size on Stochastic Gradient Descent for Nonconvex Problems
V. Patel
MLT
80
8
0
14 Sep 2017
Normalized Direction-preserving Adam
Zijun Zhang
Lin Ma
Zongpeng Li
Chuan Wu
ODL
91
29
0
13 Sep 2017
Parallelizing Linear Recurrent Neural Nets Over Sequence Length
Eric Martin
Chris Cundy
174
127
0
12 Sep 2017
Implicit Regularization in Deep Learning
Behnam Neyshabur
211
151
0
06 Sep 2017
Unsupervised feature learning with discriminative encoder
Gaurav Pandey
Ambedkar Dukkipati
SSL
76
6
0
03 Sep 2017
Adversarial Networks for Spatial Context-Aware Spectral Image Reconstruction from RGB
Aitor Alvarez-Gila
Joost van de Weijer
Estíbaliz Garrote
GAN
95
95
0
01 Sep 2017
Super-Convergence: Very Fast Training of Neural Networks Using Large Learning Rates
L. Smith
Nicholay Topin
AI4CE
247
521
0
23 Aug 2017
Deep Learning at 15PF: Supervised and Semi-Supervised Classification for Scientific Data
Thorsten Kurth
Jian Zhang
N. Satish
Ioannis Mitliagkas
Evan Racah
...
J. Deslippe
Mikhail Shiryaev
Srinivas Sridharan
P. Prabhat
Pradeep Dubey
106
83
0
17 Aug 2017
Large Batch Training of Convolutional Networks
Yang You
Igor Gitman
Boris Ginsburg
ODL
340
874
0
13 Aug 2017
Scaling Deep Learning on GPU and Knights Landing clusters
Yang You
A. Buluç
J. Demmel
GNN
87
77
0
09 Aug 2017
Video Frame Interpolation via Adaptive Separable Convolution
Simon Niklaus
Long Mai
Feng Liu
200
714
0
05 Aug 2017
Reporting Score Distributions Makes a Difference: Performance Study of LSTM-networks for Sequence Tagging
Nils Reimers
Iryna Gurevych
149
438
0
31 Jul 2017
Analysis and Optimization of Convolutional Neural Network Architectures
Martin Thoma
137
74
0
31 Jul 2017
Mini-batch Tempered MCMC
Dangna Li
W. Wong
181
8
0
31 Jul 2017
A PAC-Bayesian Approach to Spectrally-Normalized Margin Bounds for Neural Networks
Behnam Neyshabur
Srinadh Bhojanapalli
Nathan Srebro
201
621
0
29 Jul 2017
A Robust Multi-Batch L-BFGS Method for Machine Learning
A. Berahas
Martin Takáč
AAML
ODL
147
45
0
26 Jul 2017
Tensor-Based Backpropagation in Neural Networks with Non-Sequential Input
Hirsh R. Agarwal
Andrew Huang
53
0
0
13 Jul 2017
Pedestrian Alignment Network for Large-scale Person Re-identification
Zhedong Zheng
Liang Zheng
Yi Yang
146
479
0
03 Jul 2017
Towards Understanding Generalization of Deep Learning: Perspective of Loss Landscapes
Lei Wu
Zhanxing Zhu
E. Weinan
ODL
133
221
0
30 Jun 2017
Exploring Generalization in Deep Learning
Behnam Neyshabur
Srinadh Bhojanapalli
David A. McAllester
Nathan Srebro
FAtt
346
1,293
0
27 Jun 2017
Efficiency of quantum versus classical annealing in non-convex learning problems
Carlo Baldassi
R. Zecchina
140
48
0
26 Jun 2017
Gradient Diversity: a Key Ingredient for Scalable Distributed Learning
Dong Yin
A. Pananjady
Max Lam
Dimitris Papailiopoulos
Kannan Ramchandran
Peter L. Bartlett
115
11
0
18 Jun 2017
Previous
1
2
3
...
30
31
32
Next