ResearchTrend.AI
  • Papers
  • Communities
  • Organizations
  • Events
  • Blog
  • Pricing
  • Feedback
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1609.04836
  4. Cited By
On Large-Batch Training for Deep Learning: Generalization Gap and Sharp
  Minima
v1v2 (latest)

On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima

15 September 2016
N. Keskar
Dheevatsa Mudigere
J. Nocedal
M. Smelyanskiy
P. T. P. Tang
    ODL
ArXiv (abs)PDFHTML

Papers citing "On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima"

50 / 1,585 papers shown
Title
The exploding gradient problem demystified - definition, prevalence,
  impact, origin, tradeoffs, and solutions
The exploding gradient problem demystified - definition, prevalence, impact, origin, tradeoffs, and solutions
George Philipp
Basel Alomair
J. Carbonell
ODL
136
48
0
15 Dec 2017
Integrated Model, Batch and Domain Parallelism in Training Neural
  Networks
Integrated Model, Batch and Domain Parallelism in Training Neural Networks
A. Gholami
A. Azad
Peter H. Jin
Kurt Keutzer
A. Buluç
127
85
0
12 Dec 2017
Neumann Optimizer: A Practical Optimization Algorithm for Deep Neural
  Networks
Neumann Optimizer: A Practical Optimization Algorithm for Deep Neural Networks
Shankar Krishnan
Ying Xiao
Rif A. Saurous
ODL
70
20
0
08 Dec 2017
AdaBatch: Adaptive Batch Sizes for Training Deep Neural Networks
AdaBatch: Adaptive Batch Sizes for Training Deep Neural Networks
Aditya Devarakonda
Maxim Naumov
M. Garland
ODL
167
137
0
06 Dec 2017
Implicit Regularization in Nonconvex Statistical Estimation: Gradient
  Descent Converges Linearly for Phase Retrieval, Matrix Completion, and Blind
  Deconvolution
Implicit Regularization in Nonconvex Statistical Estimation: Gradient Descent Converges Linearly for Phase Retrieval, Matrix Completion, and Blind Deconvolution
Cong Ma
Kaizheng Wang
Yuejie Chi
Yuxin Chen
218
249
0
28 Nov 2017
Asymptotic Analysis via Stochastic Differential Equations of Gradient
  Descent Algorithms in Statistical and Computational Paradigms
Asymptotic Analysis via Stochastic Differential Equations of Gradient Descent Algorithms in Statistical and Computational Paradigms
Yazhen Wang
127
17
0
27 Nov 2017
Critical Learning Periods in Deep Neural Networks
Critical Learning Periods in Deep Neural Networks
Alessandro Achille
Matteo Rovere
Stefano Soatto
114
104
0
24 Nov 2017
Deep supervised learning using local errors
Deep supervised learning using local errors
Hesham Mostafa
V. Ramesh
Gert Cauwenberghs
115
123
0
17 Nov 2017
A Resizable Mini-batch Gradient Descent based on a Multi-Armed Bandit
A Resizable Mini-batch Gradient Descent based on a Multi-Armed Bandit
S. Cho
Sunghun Kang
Chang D. Yoo
102
1
0
17 Nov 2017
Decoupled Weight Decay Regularization
Decoupled Weight Decay Regularization
I. Loshchilov
Frank Hutter
OffRL
239
2,257
0
14 Nov 2017
Three Factors Influencing Minima in SGD
Three Factors Influencing Minima in SGD
Stanislaw Jastrzebski
Zachary Kenton
Devansh Arpit
Nicolas Ballas
Asja Fischer
Yoshua Bengio
Amos Storkey
245
474
0
13 Nov 2017
Scale out for large minibatch SGD: Residual network training on
  ImageNet-1K with improved accuracy and reduced time to train
Scale out for large minibatch SGD: Residual network training on ImageNet-1K with improved accuracy and reduced time to train
V. Codreanu
Damian Podareanu
V. Saletore
88
55
0
12 Nov 2017
Meta-Learning by Adjusting Priors Based on Extended PAC-Bayes Theory
Meta-Learning by Adjusting Priors Based on Extended PAC-Bayes Theory
Ron Amit
Ron Meir
BDLMLT
226
179
0
03 Nov 2017
Efficient Training of Convolutional Neural Nets on Large Distributed
  Systems
Efficient Training of Convolutional Neural Nets on Large Distributed Systems
Sameer Kumar
D. Sreedhar
Vaibhav Saxena
Yogish Sabharwal
Ashish Verma
67
5
0
02 Nov 2017
Don't Decay the Learning Rate, Increase the Batch Size
Don't Decay the Learning Rate, Increase the Batch Size
Samuel L. Smith
Pieter-Jan Kindermans
Chris Ying
Quoc V. Le
ODL
404
1,026
0
01 Nov 2017
Deep Learning as a Mixed Convex-Combinatorial Optimization Problem
Deep Learning as a Mixed Convex-Combinatorial Optimization Problem
A. Friesen
Pedro M. Domingos
65
20
0
31 Oct 2017
Stochastic gradient descent performs variational inference, converges to
  limit cycles for deep networks
Stochastic gradient descent performs variational inference, converges to limit cycles for deep networks
Pratik Chaudhari
Stefano Soatto
MLT
156
306
0
30 Oct 2017
The Implicit Bias of Gradient Descent on Separable Data
The Implicit Bias of Gradient Descent on Separable Data
Daniel Soudry
Elad Hoffer
Mor Shpigel Nacson
Suriya Gunasekar
Nathan Srebro
566
947
0
27 Oct 2017
Rethinking generalization requires revisiting old ideas: statistical
  mechanics approaches and complex learning behavior
Rethinking generalization requires revisiting old ideas: statistical mechanics approaches and complex learning behavior
Charles H. Martin
Michael W. Mahoney
AI4CE
97
64
0
26 Oct 2017
Stability and Generalization of Learning Algorithms that Converge to
  Global Optima
Stability and Generalization of Learning Algorithms that Converge to Global Optima
Zachary B. Charles
Dimitris Papailiopoulos
MLT
126
169
0
23 Oct 2017
Function Norms and Regularization in Deep Networks
Function Norms and Regularization in Deep Networks
Amal Rannen Triki
Maxim Berman
Matthew B. Blaschko
100
2
0
18 Oct 2017
A Bayesian Perspective on Generalization and Stochastic Gradient Descent
A Bayesian Perspective on Generalization and Stochastic Gradient Descent
Samuel L. Smith
Quoc V. Le
BDL
225
262
0
17 Oct 2017
Searching for Activation Functions
Searching for Activation Functions
Prajit Ramachandran
Barret Zoph
Quoc V. Le
148
624
0
16 Oct 2017
Generalization in Deep Learning
Generalization in Deep Learning
Kenji Kawaguchi
L. Kaelbling
Yoshua Bengio
ODL
419
466
0
16 Oct 2017
AdaDNNs: Adaptive Ensemble of Deep Neural Networks for Scene Text
  Recognition
AdaDNNs: Adaptive Ensemble of Deep Neural Networks for Scene Text Recognition
Chun Yang
Xu-Cheng Yin
Zejun Li
Jianwei Wu
Chunchao Guo
Hongfa Wang
Lei Xiao
66
10
0
10 Oct 2017
SGD for robot motion? The effectiveness of stochastic optimization on a
  new benchmark for biped locomotion tasks
SGD for robot motion? The effectiveness of stochastic optimization on a new benchmark for biped locomotion tasks
Martim Brandao
K. Hashimoto
A. Takanishi
78
6
0
09 Oct 2017
Neural Optimizer Search with Reinforcement Learning
Neural Optimizer Search with Reinforcement Learning
Irwan Bello
Barret Zoph
Vijay Vasudevan
Quoc V. Le
ODL
150
390
0
21 Sep 2017
ImageNet Training in Minutes
ImageNet Training in Minutes
Yang You
Zhao-jie Zhang
Cho-Jui Hsieh
J. Demmel
Kurt Keutzer
VLMLRM
214
57
0
14 Sep 2017
The Impact of Local Geometry and Batch Size on Stochastic Gradient
  Descent for Nonconvex Problems
The Impact of Local Geometry and Batch Size on Stochastic Gradient Descent for Nonconvex Problems
V. Patel
MLT
80
8
0
14 Sep 2017
Normalized Direction-preserving Adam
Normalized Direction-preserving Adam
Zijun Zhang
Lin Ma
Zongpeng Li
Chuan Wu
ODL
91
29
0
13 Sep 2017
Parallelizing Linear Recurrent Neural Nets Over Sequence Length
Parallelizing Linear Recurrent Neural Nets Over Sequence Length
Eric Martin
Chris Cundy
174
127
0
12 Sep 2017
Implicit Regularization in Deep Learning
Implicit Regularization in Deep Learning
Behnam Neyshabur
211
151
0
06 Sep 2017
Unsupervised feature learning with discriminative encoder
Unsupervised feature learning with discriminative encoder
Gaurav Pandey
Ambedkar Dukkipati
SSL
76
6
0
03 Sep 2017
Adversarial Networks for Spatial Context-Aware Spectral Image
  Reconstruction from RGB
Adversarial Networks for Spatial Context-Aware Spectral Image Reconstruction from RGB
Aitor Alvarez-Gila
Joost van de Weijer
Estíbaliz Garrote
GAN
95
95
0
01 Sep 2017
Super-Convergence: Very Fast Training of Neural Networks Using Large
  Learning Rates
Super-Convergence: Very Fast Training of Neural Networks Using Large Learning Rates
L. Smith
Nicholay Topin
AI4CE
247
521
0
23 Aug 2017
Deep Learning at 15PF: Supervised and Semi-Supervised Classification for
  Scientific Data
Deep Learning at 15PF: Supervised and Semi-Supervised Classification for Scientific Data
Thorsten Kurth
Jian Zhang
N. Satish
Ioannis Mitliagkas
Evan Racah
...
J. Deslippe
Mikhail Shiryaev
Srinivas Sridharan
P. Prabhat
Pradeep Dubey
106
83
0
17 Aug 2017
Large Batch Training of Convolutional Networks
Large Batch Training of Convolutional Networks
Yang You
Igor Gitman
Boris Ginsburg
ODL
340
874
0
13 Aug 2017
Scaling Deep Learning on GPU and Knights Landing clusters
Scaling Deep Learning on GPU and Knights Landing clusters
Yang You
A. Buluç
J. Demmel
GNN
87
77
0
09 Aug 2017
Video Frame Interpolation via Adaptive Separable Convolution
Video Frame Interpolation via Adaptive Separable Convolution
Simon Niklaus
Long Mai
Feng Liu
200
714
0
05 Aug 2017
Reporting Score Distributions Makes a Difference: Performance Study of
  LSTM-networks for Sequence Tagging
Reporting Score Distributions Makes a Difference: Performance Study of LSTM-networks for Sequence Tagging
Nils Reimers
Iryna Gurevych
149
438
0
31 Jul 2017
Analysis and Optimization of Convolutional Neural Network Architectures
Analysis and Optimization of Convolutional Neural Network Architectures
Martin Thoma
137
74
0
31 Jul 2017
Mini-batch Tempered MCMC
Mini-batch Tempered MCMC
Dangna Li
W. Wong
181
8
0
31 Jul 2017
A PAC-Bayesian Approach to Spectrally-Normalized Margin Bounds for
  Neural Networks
A PAC-Bayesian Approach to Spectrally-Normalized Margin Bounds for Neural Networks
Behnam Neyshabur
Srinadh Bhojanapalli
Nathan Srebro
201
621
0
29 Jul 2017
A Robust Multi-Batch L-BFGS Method for Machine Learning
A Robust Multi-Batch L-BFGS Method for Machine Learning
A. Berahas
Martin Takáč
AAMLODL
147
45
0
26 Jul 2017
Tensor-Based Backpropagation in Neural Networks with Non-Sequential
  Input
Tensor-Based Backpropagation in Neural Networks with Non-Sequential Input
Hirsh R. Agarwal
Andrew Huang
53
0
0
13 Jul 2017
Pedestrian Alignment Network for Large-scale Person Re-identification
Pedestrian Alignment Network for Large-scale Person Re-identification
Zhedong Zheng
Liang Zheng
Yi Yang
146
479
0
03 Jul 2017
Towards Understanding Generalization of Deep Learning: Perspective of
  Loss Landscapes
Towards Understanding Generalization of Deep Learning: Perspective of Loss Landscapes
Lei Wu
Zhanxing Zhu
E. Weinan
ODL
133
221
0
30 Jun 2017
Exploring Generalization in Deep Learning
Exploring Generalization in Deep Learning
Behnam Neyshabur
Srinadh Bhojanapalli
David A. McAllester
Nathan Srebro
FAtt
346
1,293
0
27 Jun 2017
Efficiency of quantum versus classical annealing in non-convex learning
  problems
Efficiency of quantum versus classical annealing in non-convex learning problems
Carlo Baldassi
R. Zecchina
140
48
0
26 Jun 2017
Gradient Diversity: a Key Ingredient for Scalable Distributed Learning
Gradient Diversity: a Key Ingredient for Scalable Distributed Learning
Dong Yin
A. Pananjady
Max Lam
Dimitris Papailiopoulos
Kannan Ramchandran
Peter L. Bartlett
115
11
0
18 Jun 2017
Previous
123...303132
Next