ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1705.08741
  4. Cited By
Train longer, generalize better: closing the generalization gap in large
  batch training of neural networks
v1v2 (latest)

Train longer, generalize better: closing the generalization gap in large batch training of neural networks

24 May 2017
Elad Hoffer
Itay Hubara
Daniel Soudry
    ODL
ArXiv (abs)PDFHTML

Papers citing "Train longer, generalize better: closing the generalization gap in large batch training of neural networks"

50 / 465 papers shown
A Diffusion Theory For Deep Learning Dynamics: Stochastic Gradient
  Descent Exponentially Favors Flat Minima
A Diffusion Theory For Deep Learning Dynamics: Stochastic Gradient Descent Exponentially Favors Flat Minima
Zeke Xie
Issei Sato
Masashi Sugiyama
ODL
418
18
0
10 Feb 2020
Large Batch Training Does Not Need Warmup
Large Batch Training Does Not Need Warmup
Zhouyuan Huo
Bin Gu
Heng-Chiao Huang
AI4CEODL
157
5
0
04 Feb 2020
Variance Reduction with Sparse Gradients
Variance Reduction with Sparse GradientsInternational Conference on Learning Representations (ICLR), 2020
Melih Elibol
Lihua Lei
Sai Li
131
24
0
27 Jan 2020
Understanding Why Neural Networks Generalize Well Through GSNR of
  Parameters
Understanding Why Neural Networks Generalize Well Through GSNR of ParametersInternational Conference on Learning Representations (ICLR), 2020
Jinlong Liu
Guo-qing Jiang
Yunzhi Bai
Ting Chen
Huayan Wang
AI4CE
354
57
0
21 Jan 2020
Stochastic Weight Averaging in Parallel: Large-Batch Training that
  Generalizes Well
Stochastic Weight Averaging in Parallel: Large-Batch Training that Generalizes WellInternational Conference on Learning Representations (ICLR), 2020
Vipul Gupta
S. Serrano
D. DeCoste
MoMe
290
73
0
07 Jan 2020
On the Heavy-Tailed Theory of Stochastic Gradient Descent for Deep
  Neural Networks
On the Heavy-Tailed Theory of Stochastic Gradient Descent for Deep Neural Networks
Umut Simsekli
Mert Gurbuzbalaban
T. H. Nguyen
G. Richard
Levent Sagun
323
64
0
29 Nov 2019
Auto-Precision Scaling for Distributed Deep Learning
Auto-Precision Scaling for Distributed Deep LearningInformation Security Conference (IS), 2019
Ruobing Han
J. Demmel
Yang You
171
5
0
20 Nov 2019
Distributionally Robust Neural Networks for Group Shifts: On the
  Importance of Regularization for Worst-Case Generalization
Distributionally Robust Neural Networks for Group Shifts: On the Importance of Regularization for Worst-Case Generalization
Shiori Sagawa
Pang Wei Koh
Tatsunori B. Hashimoto
Abigail Z. Jacobs
OOD
290
1,451
0
20 Nov 2019
Information-Theoretic Local Minima Characterization and Regularization
Information-Theoretic Local Minima Characterization and RegularizationInternational Conference on Machine Learning (ICML), 2019
Zhiwei Jia
Hao Su
243
22
0
19 Nov 2019
Generalization in Reinforcement Learning with Selective Noise Injection
  and Information Bottleneck
Generalization in Reinforcement Learning with Selective Noise Injection and Information BottleneckNeural Information Processing Systems (NeurIPS), 2019
Maximilian Igl
K. Ciosek
Yingzhen Li
Sebastian Tschiatschek
Cheng Zhang
Sam Devlin
Katja Hofmann
OffRL
220
188
0
28 Oct 2019
A Simple Dynamic Learning Rate Tuning Algorithm For Automated Training
  of DNNs
A Simple Dynamic Learning Rate Tuning Algorithm For Automated Training of DNNs
Koyel Mukherjee
Alind Khare
Ashish Verma
149
20
0
25 Oct 2019
Gradient Sparification for Asynchronous Distributed Training
Gradient Sparification for Asynchronous Distributed Training
Zijie Yan
FedML
63
2
0
24 Oct 2019
Improved Generalization Bounds of Group Invariant / Equivariant Deep
  Networks via Quotient Feature Spaces
Improved Generalization Bounds of Group Invariant / Equivariant Deep Networks via Quotient Feature SpacesConference on Uncertainty in Artificial Intelligence (UAI), 2019
Akiyoshi Sannai
Masaaki Imaizumi
M. Kawano
MLT
214
35
0
15 Oct 2019
On Empirical Comparisons of Optimizers for Deep Learning
On Empirical Comparisons of Optimizers for Deep Learning
Dami Choi
Christopher J. Shallue
Zachary Nado
Jaehoon Lee
Chris J. Maddison
George E. Dahl
459
289
0
11 Oct 2019
SAFA: a Semi-Asynchronous Protocol for Fast Federated Learning with Low
  Overhead
SAFA: a Semi-Asynchronous Protocol for Fast Federated Learning with Low OverheadIEEE transactions on computers (IEEE Trans. Comput.), 2019
A. Masullo
Ligang He
Toby Perrett
Rui Mao
Carsten Maple
Majid Mirmehdi
783
387
0
03 Oct 2019
How noise affects the Hessian spectrum in overparameterized neural
  networks
How noise affects the Hessian spectrum in overparameterized neural networks
Ming-Bo Wei
D. Schwab
259
32
0
01 Oct 2019
At Stability's Edge: How to Adjust Hyperparameters to Preserve Minima
  Selection in Asynchronous Training of Neural Networks?
At Stability's Edge: How to Adjust Hyperparameters to Preserve Minima Selection in Asynchronous Training of Neural Networks?International Conference on Learning Representations (ICLR), 2019
Niv Giladi
Mor Shpigel Nacson
Elad Hoffer
Daniel Soudry
193
23
0
26 Sep 2019
Mixout: Effective Regularization to Finetune Large-scale Pretrained
  Language Models
Mixout: Effective Regularization to Finetune Large-scale Pretrained Language ModelsInternational Conference on Learning Representations (ICLR), 2019
Cheolhyoung Lee
Dong Wang
Wanmo Kang
MoE
503
228
0
25 Sep 2019
Scalable Kernel Learning via the Discriminant Information
Scalable Kernel Learning via the Discriminant InformationIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2019
Mert Al
Zejiang Hou
S. Kung
143
1
0
23 Sep 2019
TabNet: Attentive Interpretable Tabular Learning
TabNet: Attentive Interpretable Tabular LearningAAAI Conference on Artificial Intelligence (AAAI), 2019
Sercan O. Arik
Tomas Pfister
LMTD
819
1,859
0
20 Aug 2019
Towards Better Generalization: BP-SVRG in Training Deep Neural Networks
Towards Better Generalization: BP-SVRG in Training Deep Neural Networks
Hao Jin
Dachao Lin
Zhihua Zhang
ODL
108
2
0
18 Aug 2019
Mix & Match: training convnets with mixed image sizes for improved
  accuracy, speed and scale resiliency
Mix & Match: training convnets with mixed image sizes for improved accuracy, speed and scale resiliency
Elad Hoffer
Berry Weinstein
Itay Hubara
Tal Ben-Nun
Torsten Hoefler
Daniel Soudry
210
25
0
12 Aug 2019
Optimizing Multi-GPU Parallelization Strategies for Deep Learning
  Training
Optimizing Multi-GPU Parallelization Strategies for Deep Learning TrainingIEEE Micro (IEEE Micro), 2019
Saptadeep Pal
Eiman Ebrahimi
A. Zulfiqar
Yaosheng Fu
Victor Zhang
Szymon Migacz
D. Nellans
Puneet Gupta
271
68
0
30 Jul 2019
Bias of Homotopic Gradient Descent for the Hinge Loss
Bias of Homotopic Gradient Descent for the Hinge LossApplied Mathematics and Optimization (AMO), 2019
Denali Molitor
Deanna Needell
Rachel A. Ward
121
6
0
26 Jul 2019
Learning Neural Networks with Adaptive Regularization
Learning Neural Networks with Adaptive RegularizationNeural Information Processing Systems (NeurIPS), 2019
Han Zhao
Yifan Hao
Ruslan Salakhutdinov
Geoffrey J. Gordon
108
16
0
14 Jul 2019
Faster Neural Network Training with Data Echoing
Faster Neural Network Training with Data Echoing
Dami Choi
Alexandre Passos
Christopher J. Shallue
George E. Dahl
350
51
0
12 Jul 2019
Towards Explaining the Regularization Effect of Initial Large Learning
  Rate in Training Neural Networks
Towards Explaining the Regularization Effect of Initial Large Learning Rate in Training Neural NetworksNeural Information Processing Systems (NeurIPS), 2019
Yuanzhi Li
Colin Wei
Tengyu Ma
312
328
0
10 Jul 2019
Which Algorithmic Choices Matter at Which Batch Sizes? Insights From a
  Noisy Quadratic Model
Which Algorithmic Choices Matter at Which Batch Sizes? Insights From a Noisy Quadratic ModelNeural Information Processing Systems (NeurIPS), 2019
Guodong Zhang
Lala Li
Zachary Nado
James Martens
Sushant Sachdeva
George E. Dahl
Christopher J. Shallue
Roger C. Grosse
418
176
0
09 Jul 2019
Stochastic Gradient and Langevin Processes
Stochastic Gradient and Langevin Processes
Xiang Cheng
Dong Yin
Peter L. Bartlett
Sai Li
275
5
0
07 Jul 2019
Time-to-Event Prediction with Neural Networks and Cox Regression
Time-to-Event Prediction with Neural Networks and Cox RegressionJournal of machine learning research (JMLR), 2019
Håvard Kvamme
Ørnulf Borgan
Ida Scheel
563
404
0
01 Jul 2019
On the Noisy Gradient Descent that Generalizes as SGD
On the Noisy Gradient Descent that Generalizes as SGD
Jingfeng Wu
Wenqing Hu
Haoyi Xiong
Jun Huan
Vladimir Braverman
Zhanxing Zhu
MLT
221
10
0
18 Jun 2019
Generalization Guarantees for Neural Networks via Harnessing the
  Low-rank Structure of the Jacobian
Generalization Guarantees for Neural Networks via Harnessing the Low-rank Structure of the Jacobian
Samet Oymak
Zalan Fabian
Mingchen Li
Mahdi Soltanolkotabi
MLT
239
100
0
12 Jun 2019
Toward Interpretable Music Tagging with Self-Attention
Toward Interpretable Music Tagging with Self-Attention
Minz Won
Sanghyuk Chun
Xavier Serra
ViT
168
85
0
12 Jun 2019
The Implicit Bias of AdaGrad on Separable Data
The Implicit Bias of AdaGrad on Separable DataNeural Information Processing Systems (NeurIPS), 2019
Qian Qian
Xiaoyuan Qian
132
24
0
09 Jun 2019
Four Things Everyone Should Know to Improve Batch Normalization
Four Things Everyone Should Know to Improve Batch NormalizationInternational Conference on Learning Representations (ICLR), 2019
Cecilia Summers
M. Dinneen
202
56
0
09 Jun 2019
Inductive Bias of Gradient Descent based Adversarial Training on
  Separable Data
Inductive Bias of Gradient Descent based Adversarial Training on Separable Data
Yan Li
Ethan X. Fang
Huan Xu
T. Zhao
269
18
0
07 Jun 2019
Automated Machine Learning: State-of-The-Art and Open Challenges
Automated Machine Learning: State-of-The-Art and Open Challenges
Radwa El Shawi
Mohamed Maher
Sherif Sakr
187
189
0
05 Jun 2019
Implicit Regularization in Deep Matrix Factorization
Implicit Regularization in Deep Matrix FactorizationNeural Information Processing Systems (NeurIPS), 2019
Sanjeev Arora
Nadav Cohen
Wei Hu
Yuping Luo
AI4CE
396
562
0
31 May 2019
Time Matters in Regularizing Deep Networks: Weight Decay and Data
  Augmentation Affect Early Learning Dynamics, Matter Little Near Convergence
Time Matters in Regularizing Deep Networks: Weight Decay and Data Augmentation Affect Early Learning Dynamics, Matter Little Near ConvergenceNeural Information Processing Systems (NeurIPS), 2019
Aditya Golatkar
Alessandro Achille
Stefano Soatto
147
105
0
30 May 2019
Lexicographic and Depth-Sensitive Margins in Homogeneous and
  Non-Homogeneous Deep Models
Lexicographic and Depth-Sensitive Margins in Homogeneous and Non-Homogeneous Deep ModelsInternational Conference on Machine Learning (ICML), 2019
Mor Shpigel Nacson
Suriya Gunasekar
Jason D. Lee
Nathan Srebro
Daniel Soudry
195
96
0
17 May 2019
Scaling Distributed Training of Flood-Filling Networks on HPC
  Infrastructure for Brain Mapping
Scaling Distributed Training of Flood-Filling Networks on HPC Infrastructure for Brain MappingDynamic Languages Symposium (DLS), 2019
Wu Dong
Murat Keçeli
Rafael Vescovi
Hanyu Li
Corey Adams
...
T. Uram
V. Vishwanath
N. Ferrier
B. Kasthuri
P. Littlewood
FedMLAI4CE
334
10
0
13 May 2019
Data-dependent Sample Complexity of Deep Neural Networks via Lipschitz
  Augmentation
Data-dependent Sample Complexity of Deep Neural Networks via Lipschitz AugmentationNeural Information Processing Systems (NeurIPS), 2019
Colin Wei
Tengyu Ma
382
122
0
09 May 2019
Batch Normalization is a Cause of Adversarial Vulnerability
Batch Normalization is a Cause of Adversarial Vulnerability
A. Galloway
A. Golubeva
T. Tanay
M. Moussa
Graham W. Taylor
ODLAAML
239
84
0
06 May 2019
Dynamic Mini-batch SGD for Elastic Distributed Training: Learning in the
  Limbo of Resources
Dynamic Mini-batch SGD for Elastic Distributed Training: Learning in the Limbo of Resources
Yanghua Peng
Hang Zhang
Yifei Ma
Tong He
Zhi-Li Zhang
Sheng Zha
Mu Li
171
24
0
26 Apr 2019
Low-Memory Neural Network Training: A Technical Report
Low-Memory Neural Network Training: A Technical Report
N. Sohoni
Christopher R. Aberger
Megan Leszczynski
Jian Zhang
Christopher Ré
254
110
0
24 Apr 2019
Large Batch Optimization for Deep Learning: Training BERT in 76 minutes
Large Batch Optimization for Deep Learning: Training BERT in 76 minutes
Yang You
Jing Li
Sashank J. Reddi
Jonathan Hseu
Sanjiv Kumar
Srinadh Bhojanapalli
Xiaodan Song
J. Demmel
Kurt Keutzer
Cho-Jui Hsieh
ODL
887
1,113
0
01 Apr 2019
On the Stability and Generalization of Learning with Kernel Activation
  Functions
On the Stability and Generalization of Learning with Kernel Activation Functions
M. Cirillo
Simone Scardapane
S. Van Vaerenbergh
A. Uncini
138
0
0
28 Mar 2019
TATi-Thermodynamic Analytics ToolkIt: TensorFlow-based software for
  posterior sampling in machine learning applications
TATi-Thermodynamic Analytics ToolkIt: TensorFlow-based software for posterior sampling in machine learning applications
Frederik Heber
Zofia Trstanova
Benedict Leimkuhler
173
0
0
20 Mar 2019
Inefficiency of K-FAC for Large Batch Size Training
Inefficiency of K-FAC for Large Batch Size Training
Linjian Ma
Gabe Montague
Jiayu Ye
Z. Yao
A. Gholami
Kurt Keutzer
Michael W. Mahoney
214
24
0
14 Mar 2019
Communication-efficient distributed SGD with Sketching
Communication-efficient distributed SGD with Sketching
Nikita Ivkin
D. Rothchild
Enayat Ullah
Vladimir Braverman
Ion Stoica
R. Arora
FedML
269
220
0
12 Mar 2019
Previous
123...106789
Next
Page 7 of 10