ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1705.08741
  4. Cited By
Train longer, generalize better: closing the generalization gap in large
  batch training of neural networks
v1v2 (latest)

Train longer, generalize better: closing the generalization gap in large batch training of neural networks

24 May 2017
Elad Hoffer
Itay Hubara
Daniel Soudry
    ODL
ArXiv (abs)PDFHTML

Papers citing "Train longer, generalize better: closing the generalization gap in large batch training of neural networks"

50 / 465 papers shown
Layer rotation: a surprisingly powerful indicator of generalization in
  deep networks?
Layer rotation: a surprisingly powerful indicator of generalization in deep networks?
Simon Carbonnelle
Christophe De Vleeschouwer
MLT
252
1
0
05 Jun 2018
Backdrop: Stochastic Backpropagation
Backdrop: Stochastic Backpropagation
Siavash Golkar
Kyle Cranmer
142
2
0
04 Jun 2018
Implicit Bias of Gradient Descent on Linear Convolutional Networks
Implicit Bias of Gradient Descent on Linear Convolutional Networks
Suriya Gunasekar
Jason D. Lee
Daniel Soudry
Nathan Srebro
MDE
468
440
0
01 Jun 2018
Scaling Neural Machine Translation
Scaling Neural Machine Translation
Myle Ott
Sergey Edunov
David Grangier
Michael Auli
AIMat
507
635
0
01 Jun 2018
Understanding Batch Normalization
Understanding Batch Normalization
Johan Bjorck
Daniel Schwalbe-Koda
B. Selman
Kilian Q. Weinberger
609
711
0
01 Jun 2018
SmoothOut: Smoothing Out Sharp Minima to Improve Generalization in Deep
  Learning
SmoothOut: Smoothing Out Sharp Minima to Improve Generalization in Deep Learning
W. Wen
Yandan Wang
Feng Yan
Cong Xu
Chunpeng Wu
Yiran Chen
Xue Yang
232
54
0
21 May 2018
Norm-Preservation: Why Residual Networks Can Become Extremely Deep?
Norm-Preservation: Why Residual Networks Can Become Extremely Deep?
Alireza Zaeemzadeh
Nazanin Rahnavard
M. Shah
155
76
0
18 May 2018
HG-means: A scalable hybrid genetic algorithm for minimum sum-of-squares
  clustering
HG-means: A scalable hybrid genetic algorithm for minimum sum-of-squares clustering
Daniel Gribel
Thibaut Vidal
122
45
0
25 Apr 2018
Revisiting Small Batch Training for Deep Neural Networks
Revisiting Small Batch Training for Deep Neural Networks
Dominic Masters
Carlo Luschi
ODL
176
735
0
20 Apr 2018
Improving Confidence Estimates for Unfamiliar Examples
Improving Confidence Estimates for Unfamiliar Examples
Zhizhong Li
Derek Hoiem
270
11
0
09 Apr 2018
Training Tips for the Transformer Model
Training Tips for the Transformer Model
Martin Popel
Ondrej Bojar
482
326
0
01 Apr 2018
Normalization of Neural Networks using Analytic Variance Propagation
Normalization of Neural Networks using Analytic Variance Propagation
Alexander Shekhovtsov
B. Flach
141
6
0
28 Mar 2018
Comparing Dynamics: Deep Neural Networks versus Glassy Systems
Comparing Dynamics: Deep Neural Networks versus Glassy Systems
Carlo Albert
Levent Sagun
Mario Geiger
S. Spigler
Gerard Ben Arous
C. Cammarota
Yann LeCun
Matthieu Wyart
Giulio Biroli
AI4CE
333
124
0
19 Mar 2018
High Throughput Synchronous Distributed Stochastic Gradient Descent
High Throughput Synchronous Distributed Stochastic Gradient Descent
Michael Teng
Frank Wood
112
2
0
12 Mar 2018
Convergence of Gradient Descent on Separable Data
Convergence of Gradient Descent on Separable Data
Mor Shpigel Nacson
Jason D. Lee
Suriya Gunasekar
Pedro H. P. Savarese
Nathan Srebro
Daniel Soudry
384
175
0
05 Mar 2018
Norm matters: efficient and accurate normalization schemes in deep
  networks
Norm matters: efficient and accurate normalization schemes in deep networks
Elad Hoffer
Ron Banner
Itay Golan
Daniel Soudry
OffRL
280
185
0
05 Mar 2018
The Anisotropic Noise in Stochastic Gradient Descent: Its Behavior of
  Escaping from Sharp Minima and Regularization Effects
The Anisotropic Noise in Stochastic Gradient Descent: Its Behavior of Escaping from Sharp Minima and Regularization Effects
Zhanxing Zhu
Jingfeng Wu
Ting Yu
Lei Wu
Jin Ma
239
40
0
01 Mar 2018
Demystifying Parallel and Distributed Deep Learning: An In-Depth
  Concurrency Analysis
Demystifying Parallel and Distributed Deep Learning: An In-Depth Concurrency AnalysisACM Computing Surveys (CSUR), 2018
Tal Ben-Nun
Torsten Hoefler
GNN
318
772
0
26 Feb 2018
A Walk with SGD
A Walk with SGD
Chen Xing
Devansh Arpit
Christos Tsirigotis
Yoshua Bengio
346
133
0
24 Feb 2018
Characterizing Implicit Bias in Terms of Optimization Geometry
Characterizing Implicit Bias in Terms of Optimization Geometry
Suriya Gunasekar
Jason D. Lee
Daniel Soudry
Nathan Srebro
AI4CE
547
435
0
22 Feb 2018
The Secret Sharer: Evaluating and Testing Unintended Memorization in
  Neural Networks
The Secret Sharer: Evaluating and Testing Unintended Memorization in Neural Networks
Nicholas Carlini
Chang-rui Liu
Ulfar Erlingsson
Jernej Kos
Basel Alomair
765
1,322
0
22 Feb 2018
Generalization in Machine Learning via Analytical Learning Theory
Generalization in Machine Learning via Analytical Learning Theory
Kenji Kawaguchi
Yoshua Bengio
Vikas Verma
Leslie Pack Kaelbling
135
10
0
21 Feb 2018
An Alternative View: When Does SGD Escape Local Minima?
An Alternative View: When Does SGD Escape Local Minima?
Robert D. Kleinberg
Yuanzhi Li
Yang Yuan
MLT
346
332
0
17 Feb 2018
A Progressive Batching L-BFGS Method for Machine Learning
A Progressive Batching L-BFGS Method for Machine Learning
Raghu Bollapragada
Dheevatsa Mudigere
J. Nocedal
Hao-Jun Michael Shi
P. T. P. Tang
ODL
234
166
0
15 Feb 2018
Fix your classifier: the marginal value of training the last weight
  layer
Fix your classifier: the marginal value of training the last weight layer
Elad Hoffer
Itay Hubara
Daniel Soudry
295
104
0
14 Jan 2018
Visualizing the Loss Landscape of Neural Nets
Visualizing the Loss Landscape of Neural NetsNeural Information Processing Systems (NeurIPS), 2017
Hao Li
Zheng Xu
Gavin Taylor
Christoph Studer
Tom Goldstein
663
2,158
0
28 Dec 2017
The Power of Interpolation: Understanding the Effectiveness of SGD in
  Modern Over-parametrized Learning
The Power of Interpolation: Understanding the Effectiveness of SGD in Modern Over-parametrized Learning
Siyuan Ma
Raef Bassily
M. Belkin
312
313
0
18 Dec 2017
MegDet: A Large Mini-Batch Object Detector
MegDet: A Large Mini-Batch Object Detector
Chao Peng
Tete Xiao
Zeming Li
Yuning Jiang
Xiangyu Zhang
Kai Jia
Gang Yu
Jian Sun
ObjD
393
325
0
20 Nov 2017
Block-Cyclic Stochastic Coordinate Descent for Deep Neural Networks
Block-Cyclic Stochastic Coordinate Descent for Deep Neural Networks
Kensuke Nakamura
Stefano Soatto
Byung-Woo Hong
BDLODL
207
8
0
20 Nov 2017
Three Factors Influencing Minima in SGD
Three Factors Influencing Minima in SGD
Stanislaw Jastrzebski
Zachary Kenton
Devansh Arpit
Nicolas Ballas
Asja Fischer
Yoshua Bengio
Amos Storkey
404
503
0
13 Nov 2017
Scale out for large minibatch SGD: Residual network training on
  ImageNet-1K with improved accuracy and reduced time to train
Scale out for large minibatch SGD: Residual network training on ImageNet-1K with improved accuracy and reduced time to train
V. Codreanu
Damian Podareanu
V. Saletore
199
57
0
12 Nov 2017
Don't Decay the Learning Rate, Increase the Batch Size
Don't Decay the Learning Rate, Increase the Batch Size
Samuel L. Smith
Pieter-Jan Kindermans
Chris Ying
Quoc V. Le
ODL
680
1,080
0
01 Nov 2017
Regularization for Deep Learning: A Taxonomy
Regularization for Deep Learning: A Taxonomy
J. Kukačka
Vladimir Golkov
Zorah Lähner
171
361
0
29 Oct 2017
The Implicit Bias of Gradient Descent on Separable Data
The Implicit Bias of Gradient Descent on Separable DataJournal of machine learning research (JMLR), 2017
Daniel Soudry
Elad Hoffer
Mor Shpigel Nacson
Suriya Gunasekar
Nathan Srebro
926
1,014
0
27 Oct 2017
A Bayesian Perspective on Generalization and Stochastic Gradient Descent
A Bayesian Perspective on Generalization and Stochastic Gradient Descent
Samuel L. Smith
Quoc V. Le
BDL
331
277
0
17 Oct 2017
Generalization in Deep Learning
Generalization in Deep Learning
Kenji Kawaguchi
L. Kaelbling
Yoshua Bengio
ODL
764
491
0
16 Oct 2017
Stochastic Nonconvex Optimization with Large Minibatches
Stochastic Nonconvex Optimization with Large Minibatches
Weiran Wang
Nathan Srebro
375
27
0
25 Sep 2017
Comparison of Batch Normalization and Weight Normalization Algorithms
  for the Large-scale Image Classification
Comparison of Batch Normalization and Weight Normalization Algorithms for the Large-scale Image Classification
Igor Gitman
Boris Ginsburg
215
67
0
24 Sep 2017
Normalized Direction-preserving Adam
Normalized Direction-preserving Adam
Zijun Zhang
Lin Ma
Zongpeng Li
Chuan Wu
ODL
195
30
0
13 Sep 2017
Super-Convergence: Very Fast Training of Neural Networks Using Large
  Learning Rates
Super-Convergence: Very Fast Training of Neural Networks Using Large Learning Rates
L. Smith
Nicholay Topin
AI4CE
429
526
0
23 Aug 2017
Large Batch Training of Convolutional Networks
Large Batch Training of Convolutional Networks
Yang You
Igor Gitman
Boris Ginsburg
ODL
514
911
0
13 Aug 2017
Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour
Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour
Priya Goyal
Piotr Dollár
Ross B. Girshick
P. Noordhuis
Lukasz Wesolowski
Aapo Kyrola
Andrew Tulloch
Yangqing Jia
Kaiming He
3DH
581
3,944
0
08 Jun 2017
Shallow Updates for Deep Reinforcement Learning
Shallow Updates for Deep Reinforcement Learning
Nir Levine
Tom Zahavy
D. Mankowitz
Aviv Tamar
Shie Mannor
OffRL
182
48
0
21 May 2017
Sharp Minima Can Generalize For Deep Nets
Sharp Minima Can Generalize For Deep Nets
Laurent Dinh
Razvan Pascanu
Samy Bengio
Yoshua Bengio
ODL
436
834
0
15 Mar 2017
Exponentially vanishing sub-optimal local minima in multilayer neural
  networks
Exponentially vanishing sub-optimal local minima in multilayer neural networksInternational Conference on Learning Representations (ICLR), 2017
Daniel Soudry
Elad Hoffer
389
98
0
19 Feb 2017
Understanding deep learning requires rethinking generalization
Understanding deep learning requires rethinking generalization
Chiyuan Zhang
Samy Bengio
Moritz Hardt
Benjamin Recht
Oriol Vinyals
HAI
738
4,928
0
10 Nov 2016
Google's Neural Machine Translation System: Bridging the Gap between
  Human and Machine Translation
Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation
Yonghui Wu
M. Schuster
Zhiwen Chen
Quoc V. Le
Mohammad Norouzi
...
Alex Rudnick
Oriol Vinyals
G. Corrado
Macduff Hughes
J. Dean
AIMat
1.3K
7,124
0
26 Sep 2016
On Large-Batch Training for Deep Learning: Generalization Gap and Sharp
  Minima
On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima
N. Keskar
Dheevatsa Mudigere
J. Nocedal
M. Smelyanskiy
P. T. P. Tang
ODL
1.3K
3,247
0
15 Sep 2016
An overview of gradient descent optimization algorithms
An overview of gradient descent optimization algorithms
Sebastian Ruder
ODL
1.0K
6,749
0
15 Sep 2016
Wide Residual Networks
Wide Residual Networks
Sergey Zagoruyko
N. Komodakis
1.0K
8,633
0
23 May 2016
Previous
123...1089
Next
Page 9 of 10