ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1609.04836
  4. Cited By
On Large-Batch Training for Deep Learning: Generalization Gap and Sharp
  Minima
v1v2 (latest)

On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima

15 September 2016
N. Keskar
Dheevatsa Mudigere
J. Nocedal
M. Smelyanskiy
P. T. P. Tang
    ODL
ArXiv (abs)PDFHTML

Papers citing "On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima"

50 / 1,653 papers shown
Title
Unsupervised feature learning with discriminative encoder
Unsupervised feature learning with discriminative encoder
Gaurav Pandey
Ambedkar Dukkipati
SSL
88
6
0
03 Sep 2017
Adversarial Networks for Spatial Context-Aware Spectral Image
  Reconstruction from RGB
Adversarial Networks for Spatial Context-Aware Spectral Image Reconstruction from RGB
Aitor Alvarez-Gila
Joost van de Weijer
Estíbaliz Garrote
GAN
166
104
0
01 Sep 2017
Super-Convergence: Very Fast Training of Neural Networks Using Large
  Learning Rates
Super-Convergence: Very Fast Training of Neural Networks Using Large Learning Rates
L. Smith
Nicholay Topin
AI4CE
411
526
0
23 Aug 2017
Deep Learning at 15PF: Supervised and Semi-Supervised Classification for
  Scientific Data
Deep Learning at 15PF: Supervised and Semi-Supervised Classification for Scientific Data
Thorsten Kurth
Jian Zhang
N. Satish
Alexia Jolicoeur-Martineau
Evan Racah
...
J. Deslippe
Mikhail Shiryaev
Srinivas Sridharan
P. Prabhat
Pradeep Dubey
158
84
0
17 Aug 2017
Large Batch Training of Convolutional Networks
Large Batch Training of Convolutional Networks
Yang You
Igor Gitman
Boris Ginsburg
ODL
454
908
0
13 Aug 2017
Scaling Deep Learning on GPU and Knights Landing clusters
Scaling Deep Learning on GPU and Knights Landing clustersInternational Conference for High Performance Computing, Networking, Storage and Analysis (SC), 2017
Yang You
A. Buluç
J. Demmel
GNN
119
80
0
09 Aug 2017
Video Frame Interpolation via Adaptive Separable Convolution
Video Frame Interpolation via Adaptive Separable Convolution
Simon Niklaus
Long Mai
Feng Liu
216
745
0
05 Aug 2017
Reporting Score Distributions Makes a Difference: Performance Study of
  LSTM-networks for Sequence Tagging
Reporting Score Distributions Makes a Difference: Performance Study of LSTM-networks for Sequence Tagging
Nils Reimers
Iryna Gurevych
209
447
0
31 Jul 2017
Analysis and Optimization of Convolutional Neural Network Architectures
Analysis and Optimization of Convolutional Neural Network Architectures
Martin Thoma
177
76
0
31 Jul 2017
Mini-batch Tempered MCMC
Mini-batch Tempered MCMC
Dangna Li
W. Wong
322
9
0
31 Jul 2017
A PAC-Bayesian Approach to Spectrally-Normalized Margin Bounds for
  Neural Networks
A PAC-Bayesian Approach to Spectrally-Normalized Margin Bounds for Neural Networks
Behnam Neyshabur
Srinadh Bhojanapalli
Nathan Srebro
241
634
0
29 Jul 2017
A Robust Multi-Batch L-BFGS Method for Machine Learning
A Robust Multi-Batch L-BFGS Method for Machine Learning
A. Berahas
Martin Takáč
AAMLODL
187
47
0
26 Jul 2017
Tensor-Based Backpropagation in Neural Networks with Non-Sequential
  Input
Tensor-Based Backpropagation in Neural Networks with Non-Sequential Input
Hirsh R. Agarwal
Andrew Huang
73
0
0
13 Jul 2017
Pedestrian Alignment Network for Large-scale Person Re-identification
Pedestrian Alignment Network for Large-scale Person Re-identification
Zhedong Zheng
Liang Zheng
Yi Yang
186
486
0
03 Jul 2017
Towards Understanding Generalization of Deep Learning: Perspective of
  Loss Landscapes
Towards Understanding Generalization of Deep Learning: Perspective of Loss Landscapes
Lei Wu
Zhanxing Zhu
E. Weinan
ODL
177
228
0
30 Jun 2017
Exploring Generalization in Deep Learning
Exploring Generalization in Deep Learning
Behnam Neyshabur
Srinadh Bhojanapalli
David A. McAllester
Nathan Srebro
FAtt
450
1,346
0
27 Jun 2017
Efficiency of quantum versus classical annealing in non-convex learning
  problems
Efficiency of quantum versus classical annealing in non-convex learning problems
Carlo Baldassi
R. Zecchina
176
52
0
26 Jun 2017
Gradient Diversity: a Key Ingredient for Scalable Distributed Learning
Gradient Diversity: a Key Ingredient for Scalable Distributed Learning
Dong Yin
A. Pananjady
Max Lam
Dimitris Papailiopoulos
Kannan Ramchandran
Peter L. Bartlett
163
12
0
18 Jun 2017
A Closer Look at Memorization in Deep Networks
A Closer Look at Memorization in Deep Networks
Devansh Arpit
Stanislaw Jastrzebski
Nicolas Ballas
David M. Krueger
Emmanuel Bengio
...
Tegan Maharaj
Asja Fischer
Aaron Courville
Yoshua Bengio
Damien Scieur
TDI
513
2,026
0
16 Jun 2017
Empirical Analysis of the Hessian of Over-Parametrized Neural Networks
Empirical Analysis of the Hessian of Over-Parametrized Neural NetworksInternational Conference on Learning Representations (ICLR), 2017
Levent Sagun
Utku Evci
V. U. Güney
Yann N. Dauphin
Léon Bottou
279
440
0
14 Jun 2017
Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour
Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour
Priya Goyal
Piotr Dollár
Ross B. Girshick
P. Noordhuis
Lukasz Wesolowski
Aapo Kyrola
Andrew Tulloch
Yangqing Jia
Kaiming He
3DH
534
3,915
0
08 Jun 2017
Characterizing Types of Convolution in Deep Convolutional Recurrent
  Neural Networks for Robust Speech Emotion Recognition
Characterizing Types of Convolution in Deep Convolutional Recurrent Neural Networks for Robust Speech Emotion Recognition
Che-Wei Huang
Shrikanth. S. Narayanan
HAI
131
27
0
07 Jun 2017
Deep Mutual Learning
Deep Mutual Learning
Ying Zhang
Tao Xiang
Timothy M. Hospedales
Huchuan Lu
FedML
524
1,852
0
01 Jun 2017
Spectral Norm Regularization for Improving the Generalizability of Deep
  Learning
Spectral Norm Regularization for Improving the Generalizability of Deep Learning
Yuichi Yoshida
Takeru Miyato
209
374
0
31 May 2017
Implicit Regularization in Matrix Factorization
Implicit Regularization in Matrix Factorization
Suriya Gunasekar
Blake E. Woodworth
Srinadh Bhojanapalli
Behnam Neyshabur
Nathan Srebro
242
527
0
25 May 2017
Train longer, generalize better: closing the generalization gap in large
  batch training of neural networks
Train longer, generalize better: closing the generalization gap in large batch training of neural networks
Elad Hoffer
Itay Hubara
Daniel Soudry
ODL
411
844
0
24 May 2017
The Marginal Value of Adaptive Gradient Methods in Machine Learning
The Marginal Value of Adaptive Gradient Methods in Machine Learning
Ashia Wilson
Rebecca Roelofs
Mitchell Stern
Nathan Srebro
Benjamin Recht
ODL
345
1,102
0
23 May 2017
TernGrad: Ternary Gradients to Reduce Communication in Distributed Deep
  Learning
TernGrad: Ternary Gradients to Reduce Communication in Distributed Deep Learning
W. Wen
Cong Xu
Feng Yan
Chunpeng Wu
Yandan Wang
Yiran Chen
Hai Helen Li
377
1,033
0
22 May 2017
Dissecting Adam: The Sign, Magnitude and Variance of Stochastic
  Gradients
Dissecting Adam: The Sign, Magnitude and Variance of Stochastic Gradients
Lukas Balles
Philipp Hennig
311
197
0
22 May 2017
On the diffusion approximation of nonconvex stochastic gradient descent
On the diffusion approximation of nonconvex stochastic gradient descent
Junyang Qian
C. J. Li
Lei Li
Jianguo Liu
DiffM
190
24
0
22 May 2017
Shake-Shake regularization
Shake-Shake regularization
Xavier Gastaldi
3DPCBDLOOD
343
391
0
21 May 2017
Shallow Updates for Deep Reinforcement Learning
Shallow Updates for Deep Reinforcement Learning
Nir Levine
Tom Zahavy
D. Mankowitz
Aviv Tamar
Shie Mannor
OffRL
167
48
0
21 May 2017
Practical Processing of Mobile Sensor Data for Continual Deep Learning
  Predictions
Practical Processing of Mobile Sensor Data for Continual Deep Learning Predictions
Kleomenis Katevas
Ilias Leontiadis
M. Pielot
Joan Serrà
HAI
104
12
0
17 May 2017
Stable Architectures for Deep Neural Networks
Stable Architectures for Deep Neural Networks
E. Haber
Lars Ruthotto
624
784
0
09 May 2017
Nonlinear Information Bottleneck
Nonlinear Information Bottleneck
Artemy Kolchinsky
Brendan D. Tracey
David Wolpert
369
174
0
06 May 2017
Unsupervised prototype learning in an associative-memory network
Huiling Zhen
Shang-Nan Wang
Haijun Zhou
SSL
79
1
0
10 Apr 2017
Snapshot Ensembles: Train 1, get M for free
Snapshot Ensembles: Train 1, get M for free
Gao Huang
Shouqing Yang
Geoff Pleiss
Zhuang Liu
John E. Hopcroft
Kilian Q. Weinberger
OODFedMLUQCV
506
1,025
0
01 Apr 2017
Computing Nonvacuous Generalization Bounds for Deep (Stochastic) Neural
  Networks with Many More Parameters than Training Data
Computing Nonvacuous Generalization Bounds for Deep (Stochastic) Neural Networks with Many More Parameters than Training Data
Gintare Karolina Dziugaite
Daniel M. Roy
370
882
0
31 Mar 2017
Sharp Minima Can Generalize For Deep Nets
Sharp Minima Can Generalize For Deep Nets
Laurent Dinh
Razvan Pascanu
Samy Bengio
Yoshua Bengio
ODL
369
827
0
15 Mar 2017
Langevin Dynamics with Continuous Tempering for Training Deep Neural
  Networks
Langevin Dynamics with Continuous Tempering for Training Deep Neural Networks
Nanyang Ye
Zhanxing Zhu
Rafał K. Mantiuk
218
21
0
13 Mar 2017
Data-Dependent Stability of Stochastic Gradient Descent
Data-Dependent Stability of Stochastic Gradient Descent
Ilja Kuzborskij
Christoph H. Lampert
MLT
363
174
0
05 Mar 2017
Training Language Models Using Target-Propagation
Training Language Models Using Target-Propagation
Sam Wiseman
S. Chopra
MarcÁurelio Ranzato
Arthur Szlam
Tian Ding
Soumith Chintala
Nicolas Vasilache
109
9
0
15 Feb 2017
Incorporating Global Visual Features into Attention-Based Neural Machine
  Translation
Incorporating Global Visual Features into Attention-Based Neural Machine TranslationConference on Empirical Methods in Natural Language Processing (EMNLP), 2017
Iacer Calixto
Qun Liu
Nick Campbell
171
163
0
23 Jan 2017
Tuning the Scheduling of Distributed Stochastic Gradient Descent with
  Bayesian Optimization
Tuning the Scheduling of Distributed Stochastic Gradient Descent with Bayesian Optimization
Valentin Dalibard
Michael Schaarschmidt
Eiko Yoneki
82
2
0
01 Dec 2016
Towards Robust Deep Neural Networks with BANG
Towards Robust Deep Neural Networks with BANG
Andras Rozsa
Manuel Günther
Terrance E. Boult
AAMLOOD
244
77
0
01 Dec 2016
GaDei: On Scale-up Training As A Service For Deep Learning
GaDei: On Scale-up Training As A Service For Deep Learning
Wei Zhang
Minwei Feng
Yunhui Zheng
Yufei Ren
Yandong Wang
...
Peng Liu
Bing Xiang
Li Zhang
Bowen Zhou
Haiwei Yang
ALM
169
10
0
18 Nov 2016
Incremental Sequence Learning
Incremental Sequence Learning
E. Jong
CLL
125
5
0
09 Nov 2016
Entropy-SGD: Biasing Gradient Descent Into Wide Valleys
Entropy-SGD: Biasing Gradient Descent Into Wide Valleys
Pratik Chaudhari
A. Choromańska
Stefano Soatto
Yann LeCun
Carlo Baldassi
C. Borgs
J. Chayes
Levent Sagun
R. Zecchina
ODL
494
832
0
06 Nov 2016
Big Batch SGD: Automated Inference using Adaptive Batch Sizes
Big Batch SGD: Automated Inference using Adaptive Batch Sizes
Soham De
A. Yadav
David Jacobs
Tom Goldstein
ODL
406
63
0
18 Oct 2016
Distributed Training of Deep Neural Networks: Theoretical and Practical
  Limits of Parallel Scalability
Distributed Training of Deep Neural Networks: Theoretical and Practical Limits of Parallel Scalability
J. Keuper
Franz-Josef Pfreundt
GNN
289
102
0
22 Sep 2016
Previous
123...323334
Next