ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1609.04836
  4. Cited By
On Large-Batch Training for Deep Learning: Generalization Gap and Sharp
  Minima
v1v2 (latest)

On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima

15 September 2016
N. Keskar
Dheevatsa Mudigere
J. Nocedal
M. Smelyanskiy
P. T. P. Tang
    ODL
ArXiv (abs)PDFHTML

Papers citing "On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima"

50 / 1,554 papers shown
Title
Implicit Regularization in Deep Learning
Implicit Regularization in Deep Learning
Behnam Neyshabur
66
148
0
06 Sep 2017
Unsupervised feature learning with discriminative encoder
Unsupervised feature learning with discriminative encoder
Gaurav Pandey
Ambedkar Dukkipati
SSL
41
6
0
03 Sep 2017
Adversarial Networks for Spatial Context-Aware Spectral Image
  Reconstruction from RGB
Adversarial Networks for Spatial Context-Aware Spectral Image Reconstruction from RGB
Aitor Alvarez-Gila
Joost van de Weijer
Estíbaliz Garrote
GAN
81
92
0
01 Sep 2017
Super-Convergence: Very Fast Training of Neural Networks Using Large
  Learning Rates
Super-Convergence: Very Fast Training of Neural Networks Using Large Learning Rates
L. Smith
Nicholay Topin
AI4CE
106
520
0
23 Aug 2017
Deep Learning at 15PF: Supervised and Semi-Supervised Classification for
  Scientific Data
Deep Learning at 15PF: Supervised and Semi-Supervised Classification for Scientific Data
Thorsten Kurth
Jian Zhang
N. Satish
Ioannis Mitliagkas
Evan Racah
...
J. Deslippe
Mikhail Shiryaev
Srinivas Sridharan
P. Prabhat
Pradeep Dubey
71
83
0
17 Aug 2017
Large Batch Training of Convolutional Networks
Large Batch Training of Convolutional Networks
Yang You
Igor Gitman
Boris Ginsburg
ODL
163
853
0
13 Aug 2017
Scaling Deep Learning on GPU and Knights Landing clusters
Scaling Deep Learning on GPU and Knights Landing clusters
Yang You
A. Buluç
J. Demmel
GNN
66
75
0
09 Aug 2017
Video Frame Interpolation via Adaptive Separable Convolution
Video Frame Interpolation via Adaptive Separable Convolution
Simon Niklaus
Long Mai
Feng Liu
113
701
0
05 Aug 2017
Reporting Score Distributions Makes a Difference: Performance Study of
  LSTM-networks for Sequence Tagging
Reporting Score Distributions Makes a Difference: Performance Study of LSTM-networks for Sequence Tagging
Nils Reimers
Iryna Gurevych
87
437
0
31 Jul 2017
Analysis and Optimization of Convolutional Neural Network Architectures
Analysis and Optimization of Convolutional Neural Network Architectures
Martin Thoma
99
73
0
31 Jul 2017
Mini-batch Tempered MCMC
Mini-batch Tempered MCMC
Dangna Li
W. Wong
53
6
0
31 Jul 2017
A PAC-Bayesian Approach to Spectrally-Normalized Margin Bounds for
  Neural Networks
A PAC-Bayesian Approach to Spectrally-Normalized Margin Bounds for Neural Networks
Behnam Neyshabur
Srinadh Bhojanapalli
Nathan Srebro
92
610
0
29 Jul 2017
A Robust Multi-Batch L-BFGS Method for Machine Learning
A Robust Multi-Batch L-BFGS Method for Machine Learning
A. Berahas
Martin Takáč
AAMLODL
111
44
0
26 Jul 2017
Tensor-Based Backpropagation in Neural Networks with Non-Sequential
  Input
Tensor-Based Backpropagation in Neural Networks with Non-Sequential Input
Hirsh R. Agarwal
Andrew Huang
19
0
0
13 Jul 2017
Pedestrian Alignment Network for Large-scale Person Re-identification
Pedestrian Alignment Network for Large-scale Person Re-identification
Zhedong Zheng
Liang Zheng
Yi Yang
123
479
0
03 Jul 2017
Towards Understanding Generalization of Deep Learning: Perspective of
  Loss Landscapes
Towards Understanding Generalization of Deep Learning: Perspective of Loss Landscapes
Lei Wu
Zhanxing Zhu
E. Weinan
ODL
69
221
0
30 Jun 2017
Exploring Generalization in Deep Learning
Exploring Generalization in Deep Learning
Behnam Neyshabur
Srinadh Bhojanapalli
David A. McAllester
Nathan Srebro
FAtt
167
1,260
0
27 Jun 2017
Efficiency of quantum versus classical annealing in non-convex learning
  problems
Efficiency of quantum versus classical annealing in non-convex learning problems
Carlo Baldassi
R. Zecchina
54
44
0
26 Jun 2017
Gradient Diversity: a Key Ingredient for Scalable Distributed Learning
Gradient Diversity: a Key Ingredient for Scalable Distributed Learning
Dong Yin
A. Pananjady
Max Lam
Dimitris Papailiopoulos
Kannan Ramchandran
Peter L. Bartlett
77
11
0
18 Jun 2017
A Closer Look at Memorization in Deep Networks
A Closer Look at Memorization in Deep Networks
Devansh Arpit
Stanislaw Jastrzebski
Nicolas Ballas
David M. Krueger
Emmanuel Bengio
...
Tegan Maharaj
Asja Fischer
Aaron Courville
Yoshua Bengio
Simon Lacoste-Julien
TDI
144
1,830
0
16 Jun 2017
Empirical Analysis of the Hessian of Over-Parametrized Neural Networks
Empirical Analysis of the Hessian of Over-Parametrized Neural Networks
Levent Sagun
Utku Evci
V. U. Güney
Yann N. Dauphin
Léon Bottou
95
420
0
14 Jun 2017
Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour
Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour
Priya Goyal
Piotr Dollár
Ross B. Girshick
P. Noordhuis
Lukasz Wesolowski
Aapo Kyrola
Andrew Tulloch
Yangqing Jia
Kaiming He
3DH
132
3,688
0
08 Jun 2017
Characterizing Types of Convolution in Deep Convolutional Recurrent
  Neural Networks for Robust Speech Emotion Recognition
Characterizing Types of Convolution in Deep Convolutional Recurrent Neural Networks for Robust Speech Emotion Recognition
Che-Wei Huang
Shrikanth. S. Narayanan
HAI
50
25
0
07 Jun 2017
Deep Mutual Learning
Deep Mutual Learning
Ying Zhang
Tao Xiang
Timothy M. Hospedales
Huchuan Lu
FedML
158
1,657
0
01 Jun 2017
Spectral Norm Regularization for Improving the Generalizability of Deep
  Learning
Spectral Norm Regularization for Improving the Generalizability of Deep Learning
Yuichi Yoshida
Takeru Miyato
91
335
0
31 May 2017
Implicit Regularization in Matrix Factorization
Implicit Regularization in Matrix Factorization
Suriya Gunasekar
Blake E. Woodworth
Srinadh Bhojanapalli
Behnam Neyshabur
Nathan Srebro
89
493
0
25 May 2017
Train longer, generalize better: closing the generalization gap in large
  batch training of neural networks
Train longer, generalize better: closing the generalization gap in large batch training of neural networks
Elad Hoffer
Itay Hubara
Daniel Soudry
ODL
194
803
0
24 May 2017
The Marginal Value of Adaptive Gradient Methods in Machine Learning
The Marginal Value of Adaptive Gradient Methods in Machine Learning
Ashia Wilson
Rebecca Roelofs
Mitchell Stern
Nathan Srebro
Benjamin Recht
ODL
96
1,034
0
23 May 2017
TernGrad: Ternary Gradients to Reduce Communication in Distributed Deep
  Learning
TernGrad: Ternary Gradients to Reduce Communication in Distributed Deep Learning
W. Wen
Cong Xu
Feng Yan
Chunpeng Wu
Yandan Wang
Yiran Chen
Hai Helen Li
207
990
0
22 May 2017
Dissecting Adam: The Sign, Magnitude and Variance of Stochastic
  Gradients
Dissecting Adam: The Sign, Magnitude and Variance of Stochastic Gradients
Lukas Balles
Philipp Hennig
100
169
0
22 May 2017
On the diffusion approximation of nonconvex stochastic gradient descent
On the diffusion approximation of nonconvex stochastic gradient descent
Junyang Qian
C. J. Li
Lei Li
Jianguo Liu
DiffM
76
24
0
22 May 2017
Shake-Shake regularization
Shake-Shake regularization
Xavier Gastaldi
3DPCBDLOOD
96
380
0
21 May 2017
Shallow Updates for Deep Reinforcement Learning
Shallow Updates for Deep Reinforcement Learning
Nir Levine
Tom Zahavy
D. Mankowitz
Aviv Tamar
Shie Mannor
OffRL
72
48
0
21 May 2017
Practical Processing of Mobile Sensor Data for Continual Deep Learning
  Predictions
Practical Processing of Mobile Sensor Data for Continual Deep Learning Predictions
Kleomenis Katevas
Ilias Leontiadis
M. Pielot
Joan Serrà
HAI
51
12
0
17 May 2017
Stable Architectures for Deep Neural Networks
Stable Architectures for Deep Neural Networks
E. Haber
Lars Ruthotto
159
735
0
09 May 2017
Nonlinear Information Bottleneck
Nonlinear Information Bottleneck
Artemy Kolchinsky
Brendan D. Tracey
David Wolpert
68
157
0
06 May 2017
Unsupervised prototype learning in an associative-memory network
Huiling Zhen
Shang-Nan Wang
Haijun Zhou
SSL
22
1
0
10 Apr 2017
Snapshot Ensembles: Train 1, get M for free
Snapshot Ensembles: Train 1, get M for free
Gao Huang
Yixuan Li
Geoff Pleiss
Zhuang Liu
John E. Hopcroft
Kilian Q. Weinberger
OODFedMLUQCV
150
954
0
01 Apr 2017
Computing Nonvacuous Generalization Bounds for Deep (Stochastic) Neural
  Networks with Many More Parameters than Training Data
Computing Nonvacuous Generalization Bounds for Deep (Stochastic) Neural Networks with Many More Parameters than Training Data
Gintare Karolina Dziugaite
Daniel M. Roy
126
820
0
31 Mar 2017
Sharp Minima Can Generalize For Deep Nets
Sharp Minima Can Generalize For Deep Nets
Laurent Dinh
Razvan Pascanu
Samy Bengio
Yoshua Bengio
ODL
147
774
0
15 Mar 2017
Langevin Dynamics with Continuous Tempering for Training Deep Neural
  Networks
Langevin Dynamics with Continuous Tempering for Training Deep Neural Networks
Nanyang Ye
Zhanxing Zhu
Rafał K. Mantiuk
92
21
0
13 Mar 2017
Data-Dependent Stability of Stochastic Gradient Descent
Data-Dependent Stability of Stochastic Gradient Descent
Ilja Kuzborskij
Christoph H. Lampert
MLT
142
166
0
05 Mar 2017
Training Language Models Using Target-Propagation
Training Language Models Using Target-Propagation
Sam Wiseman
S. Chopra
MarcÁurelio Ranzato
Arthur Szlam
Ruoyu Sun
Soumith Chintala
Nicolas Vasilache
59
8
0
15 Feb 2017
Incorporating Global Visual Features into Attention-Based Neural Machine
  Translation
Incorporating Global Visual Features into Attention-Based Neural Machine Translation
Iacer Calixto
Qun Liu
Nick Campbell
136
156
0
23 Jan 2017
Tuning the Scheduling of Distributed Stochastic Gradient Descent with
  Bayesian Optimization
Tuning the Scheduling of Distributed Stochastic Gradient Descent with Bayesian Optimization
Valentin Dalibard
Michael Schaarschmidt
Eiko Yoneki
24
2
0
01 Dec 2016
Towards Robust Deep Neural Networks with BANG
Towards Robust Deep Neural Networks with BANG
Andras Rozsa
Manuel Günther
Terrance E. Boult
AAMLOOD
86
76
0
01 Dec 2016
GaDei: On Scale-up Training As A Service For Deep Learning
GaDei: On Scale-up Training As A Service For Deep Learning
Wei Zhang
Minwei Feng
Yunhui Zheng
Yufei Ren
Yandong Wang
...
Peng Liu
Bing Xiang
Li Zhang
Bowen Zhou
Fei Wang
ALM
67
10
0
18 Nov 2016
Incremental Sequence Learning
Incremental Sequence Learning
E. Jong
CLL
33
5
0
09 Nov 2016
Entropy-SGD: Biasing Gradient Descent Into Wide Valleys
Entropy-SGD: Biasing Gradient Descent Into Wide Valleys
Pratik Chaudhari
A. Choromańska
Stefano Soatto
Yann LeCun
Carlo Baldassi
C. Borgs
J. Chayes
Levent Sagun
R. Zecchina
ODL
104
775
0
06 Nov 2016
Big Batch SGD: Automated Inference using Adaptive Batch Sizes
Big Batch SGD: Automated Inference using Adaptive Batch Sizes
Soham De
A. Yadav
David Jacobs
Tom Goldstein
ODL
177
62
0
18 Oct 2016
Previous
123...303132
Next