Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
1705.08741
Cited By
v1
v2 (latest)
Train longer, generalize better: closing the generalization gap in large batch training of neural networks
24 May 2017
Elad Hoffer
Itay Hubara
Daniel Soudry
ODL
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Train longer, generalize better: closing the generalization gap in large batch training of neural networks"
50 / 465 papers shown
A Diffusion Theory For Deep Learning Dynamics: Stochastic Gradient Descent Exponentially Favors Flat Minima
Zeke Xie
Issei Sato
Masashi Sugiyama
ODL
418
18
0
10 Feb 2020
Large Batch Training Does Not Need Warmup
Zhouyuan Huo
Bin Gu
Heng-Chiao Huang
AI4CE
ODL
157
5
0
04 Feb 2020
Variance Reduction with Sparse Gradients
International Conference on Learning Representations (ICLR), 2020
Melih Elibol
Lihua Lei
Sai Li
131
24
0
27 Jan 2020
Understanding Why Neural Networks Generalize Well Through GSNR of Parameters
International Conference on Learning Representations (ICLR), 2020
Jinlong Liu
Guo-qing Jiang
Yunzhi Bai
Ting Chen
Huayan Wang
AI4CE
354
57
0
21 Jan 2020
Stochastic Weight Averaging in Parallel: Large-Batch Training that Generalizes Well
International Conference on Learning Representations (ICLR), 2020
Vipul Gupta
S. Serrano
D. DeCoste
MoMe
290
73
0
07 Jan 2020
On the Heavy-Tailed Theory of Stochastic Gradient Descent for Deep Neural Networks
Umut Simsekli
Mert Gurbuzbalaban
T. H. Nguyen
G. Richard
Levent Sagun
323
64
0
29 Nov 2019
Auto-Precision Scaling for Distributed Deep Learning
Information Security Conference (IS), 2019
Ruobing Han
J. Demmel
Yang You
171
5
0
20 Nov 2019
Distributionally Robust Neural Networks for Group Shifts: On the Importance of Regularization for Worst-Case Generalization
Shiori Sagawa
Pang Wei Koh
Tatsunori B. Hashimoto
Abigail Z. Jacobs
OOD
290
1,451
0
20 Nov 2019
Information-Theoretic Local Minima Characterization and Regularization
International Conference on Machine Learning (ICML), 2019
Zhiwei Jia
Hao Su
243
22
0
19 Nov 2019
Generalization in Reinforcement Learning with Selective Noise Injection and Information Bottleneck
Neural Information Processing Systems (NeurIPS), 2019
Maximilian Igl
K. Ciosek
Yingzhen Li
Sebastian Tschiatschek
Cheng Zhang
Sam Devlin
Katja Hofmann
OffRL
220
188
0
28 Oct 2019
A Simple Dynamic Learning Rate Tuning Algorithm For Automated Training of DNNs
Koyel Mukherjee
Alind Khare
Ashish Verma
149
20
0
25 Oct 2019
Gradient Sparification for Asynchronous Distributed Training
Zijie Yan
FedML
63
2
0
24 Oct 2019
Improved Generalization Bounds of Group Invariant / Equivariant Deep Networks via Quotient Feature Spaces
Conference on Uncertainty in Artificial Intelligence (UAI), 2019
Akiyoshi Sannai
Masaaki Imaizumi
M. Kawano
MLT
214
35
0
15 Oct 2019
On Empirical Comparisons of Optimizers for Deep Learning
Dami Choi
Christopher J. Shallue
Zachary Nado
Jaehoon Lee
Chris J. Maddison
George E. Dahl
459
289
0
11 Oct 2019
SAFA: a Semi-Asynchronous Protocol for Fast Federated Learning with Low Overhead
IEEE transactions on computers (IEEE Trans. Comput.), 2019
A. Masullo
Ligang He
Toby Perrett
Rui Mao
Carsten Maple
Majid Mirmehdi
783
387
0
03 Oct 2019
How noise affects the Hessian spectrum in overparameterized neural networks
Ming-Bo Wei
D. Schwab
259
32
0
01 Oct 2019
At Stability's Edge: How to Adjust Hyperparameters to Preserve Minima Selection in Asynchronous Training of Neural Networks?
International Conference on Learning Representations (ICLR), 2019
Niv Giladi
Mor Shpigel Nacson
Elad Hoffer
Daniel Soudry
193
23
0
26 Sep 2019
Mixout: Effective Regularization to Finetune Large-scale Pretrained Language Models
International Conference on Learning Representations (ICLR), 2019
Cheolhyoung Lee
Dong Wang
Wanmo Kang
MoE
503
228
0
25 Sep 2019
Scalable Kernel Learning via the Discriminant Information
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2019
Mert Al
Zejiang Hou
S. Kung
143
1
0
23 Sep 2019
TabNet: Attentive Interpretable Tabular Learning
AAAI Conference on Artificial Intelligence (AAAI), 2019
Sercan O. Arik
Tomas Pfister
LMTD
819
1,859
0
20 Aug 2019
Towards Better Generalization: BP-SVRG in Training Deep Neural Networks
Hao Jin
Dachao Lin
Zhihua Zhang
ODL
108
2
0
18 Aug 2019
Mix & Match: training convnets with mixed image sizes for improved accuracy, speed and scale resiliency
Elad Hoffer
Berry Weinstein
Itay Hubara
Tal Ben-Nun
Torsten Hoefler
Daniel Soudry
210
25
0
12 Aug 2019
Optimizing Multi-GPU Parallelization Strategies for Deep Learning Training
IEEE Micro (IEEE Micro), 2019
Saptadeep Pal
Eiman Ebrahimi
A. Zulfiqar
Yaosheng Fu
Victor Zhang
Szymon Migacz
D. Nellans
Puneet Gupta
271
68
0
30 Jul 2019
Bias of Homotopic Gradient Descent for the Hinge Loss
Applied Mathematics and Optimization (AMO), 2019
Denali Molitor
Deanna Needell
Rachel A. Ward
121
6
0
26 Jul 2019
Learning Neural Networks with Adaptive Regularization
Neural Information Processing Systems (NeurIPS), 2019
Han Zhao
Yifan Hao
Ruslan Salakhutdinov
Geoffrey J. Gordon
108
16
0
14 Jul 2019
Faster Neural Network Training with Data Echoing
Dami Choi
Alexandre Passos
Christopher J. Shallue
George E. Dahl
350
51
0
12 Jul 2019
Towards Explaining the Regularization Effect of Initial Large Learning Rate in Training Neural Networks
Neural Information Processing Systems (NeurIPS), 2019
Yuanzhi Li
Colin Wei
Tengyu Ma
312
328
0
10 Jul 2019
Which Algorithmic Choices Matter at Which Batch Sizes? Insights From a Noisy Quadratic Model
Neural Information Processing Systems (NeurIPS), 2019
Guodong Zhang
Lala Li
Zachary Nado
James Martens
Sushant Sachdeva
George E. Dahl
Christopher J. Shallue
Roger C. Grosse
418
176
0
09 Jul 2019
Stochastic Gradient and Langevin Processes
Xiang Cheng
Dong Yin
Peter L. Bartlett
Sai Li
275
5
0
07 Jul 2019
Time-to-Event Prediction with Neural Networks and Cox Regression
Journal of machine learning research (JMLR), 2019
Håvard Kvamme
Ørnulf Borgan
Ida Scheel
563
404
0
01 Jul 2019
On the Noisy Gradient Descent that Generalizes as SGD
Jingfeng Wu
Wenqing Hu
Haoyi Xiong
Jun Huan
Vladimir Braverman
Zhanxing Zhu
MLT
221
10
0
18 Jun 2019
Generalization Guarantees for Neural Networks via Harnessing the Low-rank Structure of the Jacobian
Samet Oymak
Zalan Fabian
Mingchen Li
Mahdi Soltanolkotabi
MLT
239
100
0
12 Jun 2019
Toward Interpretable Music Tagging with Self-Attention
Minz Won
Sanghyuk Chun
Xavier Serra
ViT
168
85
0
12 Jun 2019
The Implicit Bias of AdaGrad on Separable Data
Neural Information Processing Systems (NeurIPS), 2019
Qian Qian
Xiaoyuan Qian
132
24
0
09 Jun 2019
Four Things Everyone Should Know to Improve Batch Normalization
International Conference on Learning Representations (ICLR), 2019
Cecilia Summers
M. Dinneen
202
56
0
09 Jun 2019
Inductive Bias of Gradient Descent based Adversarial Training on Separable Data
Yan Li
Ethan X. Fang
Huan Xu
T. Zhao
269
18
0
07 Jun 2019
Automated Machine Learning: State-of-The-Art and Open Challenges
Radwa El Shawi
Mohamed Maher
Sherif Sakr
187
189
0
05 Jun 2019
Implicit Regularization in Deep Matrix Factorization
Neural Information Processing Systems (NeurIPS), 2019
Sanjeev Arora
Nadav Cohen
Wei Hu
Yuping Luo
AI4CE
396
562
0
31 May 2019
Time Matters in Regularizing Deep Networks: Weight Decay and Data Augmentation Affect Early Learning Dynamics, Matter Little Near Convergence
Neural Information Processing Systems (NeurIPS), 2019
Aditya Golatkar
Alessandro Achille
Stefano Soatto
147
105
0
30 May 2019
Lexicographic and Depth-Sensitive Margins in Homogeneous and Non-Homogeneous Deep Models
International Conference on Machine Learning (ICML), 2019
Mor Shpigel Nacson
Suriya Gunasekar
Jason D. Lee
Nathan Srebro
Daniel Soudry
195
96
0
17 May 2019
Scaling Distributed Training of Flood-Filling Networks on HPC Infrastructure for Brain Mapping
Dynamic Languages Symposium (DLS), 2019
Wu Dong
Murat Keçeli
Rafael Vescovi
Hanyu Li
Corey Adams
...
T. Uram
V. Vishwanath
N. Ferrier
B. Kasthuri
P. Littlewood
FedML
AI4CE
334
10
0
13 May 2019
Data-dependent Sample Complexity of Deep Neural Networks via Lipschitz Augmentation
Neural Information Processing Systems (NeurIPS), 2019
Colin Wei
Tengyu Ma
382
122
0
09 May 2019
Batch Normalization is a Cause of Adversarial Vulnerability
A. Galloway
A. Golubeva
T. Tanay
M. Moussa
Graham W. Taylor
ODL
AAML
239
84
0
06 May 2019
Dynamic Mini-batch SGD for Elastic Distributed Training: Learning in the Limbo of Resources
Yanghua Peng
Hang Zhang
Yifei Ma
Tong He
Zhi-Li Zhang
Sheng Zha
Mu Li
171
24
0
26 Apr 2019
Low-Memory Neural Network Training: A Technical Report
N. Sohoni
Christopher R. Aberger
Megan Leszczynski
Jian Zhang
Christopher Ré
254
110
0
24 Apr 2019
Large Batch Optimization for Deep Learning: Training BERT in 76 minutes
Yang You
Jing Li
Sashank J. Reddi
Jonathan Hseu
Sanjiv Kumar
Srinadh Bhojanapalli
Xiaodan Song
J. Demmel
Kurt Keutzer
Cho-Jui Hsieh
ODL
887
1,113
0
01 Apr 2019
On the Stability and Generalization of Learning with Kernel Activation Functions
M. Cirillo
Simone Scardapane
S. Van Vaerenbergh
A. Uncini
138
0
0
28 Mar 2019
TATi-Thermodynamic Analytics ToolkIt: TensorFlow-based software for posterior sampling in machine learning applications
Frederik Heber
Zofia Trstanova
Benedict Leimkuhler
173
0
0
20 Mar 2019
Inefficiency of K-FAC for Large Batch Size Training
Linjian Ma
Gabe Montague
Jiayu Ye
Z. Yao
A. Gholami
Kurt Keutzer
Michael W. Mahoney
214
24
0
14 Mar 2019
Communication-efficient distributed SGD with Sketching
Nikita Ivkin
D. Rothchild
Enayat Ullah
Vladimir Braverman
Ion Stoica
R. Arora
FedML
269
220
0
12 Mar 2019
Previous
1
2
3
...
10
6
7
8
9
Next
Page 7 of 10