Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
1705.08741
Cited By
v1
v2 (latest)
Train longer, generalize better: closing the generalization gap in large batch training of neural networks
24 May 2017
Elad Hoffer
Itay Hubara
Daniel Soudry
ODL
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Train longer, generalize better: closing the generalization gap in large batch training of neural networks"
50 / 465 papers shown
Understanding How Over-Parametrization Leads to Acceleration: A case of learning a single teacher neuron
Asian Conference on Machine Learning (ACML), 2020
Jun-Kun Wang
Jacob D. Abernethy
265
1
0
04 Oct 2020
Quickly Finding a Benign Region via Heavy Ball Momentum in Non-Convex Optimization
Jun-Kun Wang
Jacob D. Abernethy
293
8
0
04 Oct 2020
Improved generalization by noise enhancement
Takashi Mori
Masahito Ueda
167
3
0
28 Sep 2020
Normalization Techniques in Training DNNs: Methodology, Analysis and Application
IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2020
Lei Huang
Jie Qin
Yi Zhou
Fan Zhu
Li Liu
Ling Shao
AI4CE
377
384
0
27 Sep 2020
Anomalous diffusion dynamics of learning in deep neural networks
Neural Networks (NN), 2020
Guozhang Chen
Chengqing Qu
P. Gong
279
23
0
22 Sep 2020
Unsupervised Domain Adaptation by Uncertain Feature Alignment
British Machine Vision Conference (BMVC), 2020
Tobias Ringwald
Rainer Stiefelhagen
155
7
0
14 Sep 2020
HPSGD: Hierarchical Parallel SGD With Stale Gradients Featuring
Yuhao Zhou
Qing Ye
Hailun Zhang
Jiancheng Lv
3DH
202
0
0
06 Sep 2020
S-SGD: Symmetrical Stochastic Gradient Descent with Weight Noise Injection for Reaching Flat Minima
Wonyong Sung
Iksoo Choi
Jinhwan Park
Seokhyun Choi
Sungho Shin
ODL
145
8
0
05 Sep 2020
Binary Classification as a Phase Separation Process
Rafael Monteiro
79
0
0
05 Sep 2020
HydaLearn: Highly Dynamic Task Weighting for Multi-task Learning with Auxiliary Tasks
Sam Verboven
M. H. Chaudhary
Jeroen Berrevoets
Wouter Verbeke
163
7
0
26 Aug 2020
Noise-induced degeneration in online learning
Yuzuru Sato
Daiji Tsutsui
A. Fujiwara
143
2
0
24 Aug 2020
Relevance of Rotationally Equivariant Convolutions for Predicting Molecular Properties
Benjamin Kurt Miller
Mario Geiger
Tess E. Smidt
Frank Noé
327
81
0
19 Aug 2020
BroadFace: Looking at Tens of Thousands of People at Once for Face Recognition
Y. Kim
Wonpyo Park
Jongju Shin
CVBM
326
57
0
15 Aug 2020
TF-NAS: Rethinking Three Search Freedoms of Latency-Constrained Differentiable Neural Architecture Search
European Conference on Computer Vision (ECCV), 2020
Yibo Hu
Xiang Wu
Ran He
182
47
0
12 Aug 2020
Why to "grow" and "harvest" deep learning models?
I. Kulikovskikh
Tarzan Legović
VLM
75
0
0
08 Aug 2020
Implicit Regularization via Neural Feature Alignment
A. Baratin
Thomas George
César Laurent
R. Devon Hjelm
Guillaume Lajoie
Pascal Vincent
Damien Scieur
130
7
0
03 Aug 2020
Stochastic Normalized Gradient Descent with Momentum for Large-Batch Training
Science China Information Sciences (Sci China Inf Sci), 2020
Shen-Yi Zhao
Chang-Wei Shi
Yin-Peng Xie
Wu-Jun Li
ODL
229
10
0
28 Jul 2020
A New Look at Ghost Normalization
Neofytos Dimitriou
Ognjen Arandjelovic
227
9
0
16 Jul 2020
Analyzing and Mitigating Data Stalls in DNN Training
Proceedings of the VLDB Endowment (PVLDB), 2020
Jayashree Mohan
Amar Phanishayee
Ashish Raniwala
Vijay Chidambaram
224
120
0
14 Jul 2020
Adaptive Periodic Averaging: A Practical Approach to Reducing Communication in Distributed Learning
Peng Jiang
G. Agrawal
150
5
0
13 Jul 2020
AdaScale SGD: A User-Friendly Algorithm for Distributed Training
International Conference on Machine Learning (ICML), 2020
Tyler B. Johnson
Pulkit Agrawal
Haijie Gu
Carlos Guestrin
ODL
168
40
0
09 Jul 2020
Guided Learning of Nonconvex Models through Successive Functional Gradient Optimization
Rie Johnson
Tong Zhang
69
8
0
30 Jun 2020
Is SGD a Bayesian sampler? Well, almost
Chris Mingard
Guillermo Valle Pérez
Joar Skalse
A. Louis
BDL
303
64
0
26 Jun 2020
On the Generalization Benefit of Noise in Stochastic Gradient Descent
Samuel L. Smith
Erich Elsen
Soham De
MLT
217
116
0
26 Jun 2020
Smooth Adversarial Training
Cihang Xie
Mingxing Tan
Boqing Gong
Alan Yuille
Quoc V. Le
OOD
222
160
0
25 Jun 2020
How do SGD hyperparameters in natural training affect adversarial robustness?
Sandesh Kamath
Amit Deshpande
K. Subrahmanyam
AAML
122
3
0
20 Jun 2020
Learning Rates as a Function of Batch Size: A Random Matrix Theory Approach to Neural Network Training
Diego Granziol
S. Zohren
Stephen J. Roberts
ODL
521
64
0
16 Jun 2020
PAC-Bayesian Generalization Bounds for MultiLayer Perceptrons
Xinjie Lan
Xin Guo
Kenneth Barner
195
3
0
16 Jun 2020
Shape Matters: Understanding the Implicit Bias of the Noise Covariance
Jeff Z. HaoChen
Colin Wei
Jason D. Lee
Tengyu Ma
615
109
0
15 Jun 2020
The Limit of the Batch Size
Yang You
Yuhui Wang
Huan Zhang
Zhao-jie Zhang
J. Demmel
Cho-Jui Hsieh
283
23
0
15 Jun 2020
Optimization Theory for ReLU Neural Networks Trained with Normalization Layers
International Conference on Machine Learning (ICML), 2020
Yonatan Dukler
Quanquan Gu
Guido Montúfar
206
30
0
11 Jun 2020
Extrapolation for Large-batch Training in Deep Learning
International Conference on Machine Learning (ICML), 2020
Tao Lin
Lingjing Kong
Sebastian U. Stich
Martin Jaggi
259
40
0
10 Jun 2020
Scaling Distributed Training with Adaptive Summation
Saeed Maleki
Madan Musuvathi
Todd Mytkowicz
Olli Saarikivi
Tianju Xu
Vadim Eksarevskiy
Jaliya Ekanayake
Emad Barsoum
116
10
0
04 Jun 2020
Inherent Noise in Gradient Based Methods
Arushi Gupta
121
0
0
26 May 2020
Learning Rate Annealing Can Provably Help Generalization, Even for Convex Problems
Preetum Nakkiran
MLT
160
23
0
15 May 2020
2kenize: Tying Subword Sequences for Chinese Script Conversion
Pranav A
Isabelle Augenstein
193
1
0
07 May 2020
Dynamic backup workers for parallel machine learning
Chuan Xu
Giovanni Neglia
Nicola Sebastianelli
274
12
0
30 Apr 2020
The Impact of the Mini-batch Size on the Variance of Gradients in Stochastic Gradient Descent
Xin-Yao Qian
Diego Klabjan
ODL
147
40
0
27 Apr 2020
SIPA: A Simple Framework for Efficient Networks
Gihun Lee
Sangmin Bae
Jaehoon Oh
Seyoung Yun
121
1
0
24 Apr 2020
Predicting the outputs of finite deep neural networks trained with noisy gradients
Physical Review E (PRE), 2020
Gadi Naveh
Oded Ben-David
H. Sompolinsky
Zohar Ringel
438
31
0
02 Apr 2020
Stochastic Proximal Gradient Algorithm with Minibatches. Application to Large Scale Learning Models
A. Pătraşcu
C. Paduraru
Paul Irofti
122
0
0
30 Mar 2020
Understanding the Effects of Data Parallelism and Sparsity on Neural Network Training
Namhoon Lee
Thalaiyasingam Ajanthan
Juil Sock
Martin Jaggi
207
2
0
25 Mar 2020
Robust and On-the-fly Dataset Denoising for Image Classification
European Conference on Computer Vision (ECCV), 2020
Jiaming Song
Lunjia Hu
Michael Auli
Yann N. Dauphin
Tengyu Ma
NoLa
OOD
186
13
0
24 Mar 2020
The Implicit Regularization of Stochastic Gradient Flow for Least Squares
International Conference on Machine Learning (ICML), 2020
Alnur Ali
Guang Cheng
Robert Tibshirani
177
81
0
17 Mar 2020
Communication-Efficient Distributed Deep Learning: A Comprehensive Survey
Zhenheng Tang
Shaoshuai Shi
Wei Wang
Yue Liu
Xiaowen Chu
249
54
0
10 Mar 2020
AL2: Progressive Activation Loss for Learning General Representations in Classification Neural Networks
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2020
Majed El Helou
Frederike Dumbgen
Sabine Süsstrunk
CLL
AI4CE
134
2
0
07 Mar 2020
Automatic Perturbation Analysis for Scalable Certified Robustness and Beyond
Kaidi Xu
Zhouxing Shi
Huan Zhang
Yihan Wang
Kai-Wei Chang
Shiyu Huang
B. Kailkhura
Xinyu Lin
Cho-Jui Hsieh
AAML
313
15
0
28 Feb 2020
Batch Normalization Biases Residual Blocks Towards the Identity Function in Deep Networks
Soham De
Samuel L. Smith
ODL
248
20
0
24 Feb 2020
The Two Regimes of Deep Network Training
Guillaume Leclerc
Aleksander Madry
197
49
0
24 Feb 2020
Unique Properties of Flat Minima in Deep Networks
International Conference on Machine Learning (ICML), 2020
Rotem Mulayoff
T. Michaeli
ODL
105
4
0
11 Feb 2020
Previous
1
2
3
...
10
5
6
7
8
9
Next
Page 6 of 10