ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1705.08741
  4. Cited By
Train longer, generalize better: closing the generalization gap in large
  batch training of neural networks
v1v2 (latest)

Train longer, generalize better: closing the generalization gap in large batch training of neural networks

24 May 2017
Elad Hoffer
Itay Hubara
Daniel Soudry
    ODL
ArXiv (abs)PDFHTML

Papers citing "Train longer, generalize better: closing the generalization gap in large batch training of neural networks"

50 / 465 papers shown
Understanding How Over-Parametrization Leads to Acceleration: A case of
  learning a single teacher neuron
Understanding How Over-Parametrization Leads to Acceleration: A case of learning a single teacher neuronAsian Conference on Machine Learning (ACML), 2020
Jun-Kun Wang
Jacob D. Abernethy
265
1
0
04 Oct 2020
Quickly Finding a Benign Region via Heavy Ball Momentum in Non-Convex
  Optimization
Quickly Finding a Benign Region via Heavy Ball Momentum in Non-Convex Optimization
Jun-Kun Wang
Jacob D. Abernethy
293
8
0
04 Oct 2020
Improved generalization by noise enhancement
Improved generalization by noise enhancement
Takashi Mori
Masahito Ueda
167
3
0
28 Sep 2020
Normalization Techniques in Training DNNs: Methodology, Analysis and
  Application
Normalization Techniques in Training DNNs: Methodology, Analysis and ApplicationIEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2020
Lei Huang
Jie Qin
Yi Zhou
Fan Zhu
Li Liu
Ling Shao
AI4CE
377
384
0
27 Sep 2020
Anomalous diffusion dynamics of learning in deep neural networks
Anomalous diffusion dynamics of learning in deep neural networksNeural Networks (NN), 2020
Guozhang Chen
Chengqing Qu
P. Gong
279
23
0
22 Sep 2020
Unsupervised Domain Adaptation by Uncertain Feature Alignment
Unsupervised Domain Adaptation by Uncertain Feature AlignmentBritish Machine Vision Conference (BMVC), 2020
Tobias Ringwald
Rainer Stiefelhagen
155
7
0
14 Sep 2020
HPSGD: Hierarchical Parallel SGD With Stale Gradients Featuring
HPSGD: Hierarchical Parallel SGD With Stale Gradients Featuring
Yuhao Zhou
Qing Ye
Hailun Zhang
Jiancheng Lv
3DH
202
0
0
06 Sep 2020
S-SGD: Symmetrical Stochastic Gradient Descent with Weight Noise
  Injection for Reaching Flat Minima
S-SGD: Symmetrical Stochastic Gradient Descent with Weight Noise Injection for Reaching Flat Minima
Wonyong Sung
Iksoo Choi
Jinhwan Park
Seokhyun Choi
Sungho Shin
ODL
145
8
0
05 Sep 2020
Binary Classification as a Phase Separation Process
Binary Classification as a Phase Separation Process
Rafael Monteiro
79
0
0
05 Sep 2020
HydaLearn: Highly Dynamic Task Weighting for Multi-task Learning with
  Auxiliary Tasks
HydaLearn: Highly Dynamic Task Weighting for Multi-task Learning with Auxiliary Tasks
Sam Verboven
M. H. Chaudhary
Jeroen Berrevoets
Wouter Verbeke
163
7
0
26 Aug 2020
Noise-induced degeneration in online learning
Noise-induced degeneration in online learning
Yuzuru Sato
Daiji Tsutsui
A. Fujiwara
143
2
0
24 Aug 2020
Relevance of Rotationally Equivariant Convolutions for Predicting
  Molecular Properties
Relevance of Rotationally Equivariant Convolutions for Predicting Molecular Properties
Benjamin Kurt Miller
Mario Geiger
Tess E. Smidt
Frank Noé
327
81
0
19 Aug 2020
BroadFace: Looking at Tens of Thousands of People at Once for Face
  Recognition
BroadFace: Looking at Tens of Thousands of People at Once for Face Recognition
Y. Kim
Wonpyo Park
Jongju Shin
CVBM
326
57
0
15 Aug 2020
TF-NAS: Rethinking Three Search Freedoms of Latency-Constrained
  Differentiable Neural Architecture Search
TF-NAS: Rethinking Three Search Freedoms of Latency-Constrained Differentiable Neural Architecture SearchEuropean Conference on Computer Vision (ECCV), 2020
Yibo Hu
Xiang Wu
Ran He
182
47
0
12 Aug 2020
Why to "grow" and "harvest" deep learning models?
Why to "grow" and "harvest" deep learning models?
I. Kulikovskikh
Tarzan Legović
VLM
75
0
0
08 Aug 2020
Implicit Regularization via Neural Feature Alignment
Implicit Regularization via Neural Feature Alignment
A. Baratin
Thomas George
César Laurent
R. Devon Hjelm
Guillaume Lajoie
Pascal Vincent
Damien Scieur
130
7
0
03 Aug 2020
Stochastic Normalized Gradient Descent with Momentum for Large-Batch
  Training
Stochastic Normalized Gradient Descent with Momentum for Large-Batch TrainingScience China Information Sciences (Sci China Inf Sci), 2020
Shen-Yi Zhao
Chang-Wei Shi
Yin-Peng Xie
Wu-Jun Li
ODL
229
10
0
28 Jul 2020
A New Look at Ghost Normalization
A New Look at Ghost Normalization
Neofytos Dimitriou
Ognjen Arandjelovic
227
9
0
16 Jul 2020
Analyzing and Mitigating Data Stalls in DNN Training
Analyzing and Mitigating Data Stalls in DNN TrainingProceedings of the VLDB Endowment (PVLDB), 2020
Jayashree Mohan
Amar Phanishayee
Ashish Raniwala
Vijay Chidambaram
224
120
0
14 Jul 2020
Adaptive Periodic Averaging: A Practical Approach to Reducing
  Communication in Distributed Learning
Adaptive Periodic Averaging: A Practical Approach to Reducing Communication in Distributed Learning
Peng Jiang
G. Agrawal
150
5
0
13 Jul 2020
AdaScale SGD: A User-Friendly Algorithm for Distributed Training
AdaScale SGD: A User-Friendly Algorithm for Distributed TrainingInternational Conference on Machine Learning (ICML), 2020
Tyler B. Johnson
Pulkit Agrawal
Haijie Gu
Carlos Guestrin
ODL
168
40
0
09 Jul 2020
Guided Learning of Nonconvex Models through Successive Functional
  Gradient Optimization
Guided Learning of Nonconvex Models through Successive Functional Gradient Optimization
Rie Johnson
Tong Zhang
69
8
0
30 Jun 2020
Is SGD a Bayesian sampler? Well, almost
Is SGD a Bayesian sampler? Well, almost
Chris Mingard
Guillermo Valle Pérez
Joar Skalse
A. Louis
BDL
303
64
0
26 Jun 2020
On the Generalization Benefit of Noise in Stochastic Gradient Descent
On the Generalization Benefit of Noise in Stochastic Gradient Descent
Samuel L. Smith
Erich Elsen
Soham De
MLT
217
116
0
26 Jun 2020
Smooth Adversarial Training
Smooth Adversarial Training
Cihang Xie
Mingxing Tan
Boqing Gong
Alan Yuille
Quoc V. Le
OOD
222
160
0
25 Jun 2020
How do SGD hyperparameters in natural training affect adversarial
  robustness?
How do SGD hyperparameters in natural training affect adversarial robustness?
Sandesh Kamath
Amit Deshpande
K. Subrahmanyam
AAML
122
3
0
20 Jun 2020
Learning Rates as a Function of Batch Size: A Random Matrix Theory
  Approach to Neural Network Training
Learning Rates as a Function of Batch Size: A Random Matrix Theory Approach to Neural Network Training
Diego Granziol
S. Zohren
Stephen J. Roberts
ODL
521
64
0
16 Jun 2020
PAC-Bayesian Generalization Bounds for MultiLayer Perceptrons
PAC-Bayesian Generalization Bounds for MultiLayer Perceptrons
Xinjie Lan
Xin Guo
Kenneth Barner
195
3
0
16 Jun 2020
Shape Matters: Understanding the Implicit Bias of the Noise Covariance
Shape Matters: Understanding the Implicit Bias of the Noise Covariance
Jeff Z. HaoChen
Colin Wei
Jason D. Lee
Tengyu Ma
615
109
0
15 Jun 2020
The Limit of the Batch Size
The Limit of the Batch Size
Yang You
Yuhui Wang
Huan Zhang
Zhao-jie Zhang
J. Demmel
Cho-Jui Hsieh
283
23
0
15 Jun 2020
Optimization Theory for ReLU Neural Networks Trained with Normalization
  Layers
Optimization Theory for ReLU Neural Networks Trained with Normalization LayersInternational Conference on Machine Learning (ICML), 2020
Yonatan Dukler
Quanquan Gu
Guido Montúfar
206
30
0
11 Jun 2020
Extrapolation for Large-batch Training in Deep Learning
Extrapolation for Large-batch Training in Deep LearningInternational Conference on Machine Learning (ICML), 2020
Tao Lin
Lingjing Kong
Sebastian U. Stich
Martin Jaggi
259
40
0
10 Jun 2020
Scaling Distributed Training with Adaptive Summation
Scaling Distributed Training with Adaptive Summation
Saeed Maleki
Madan Musuvathi
Todd Mytkowicz
Olli Saarikivi
Tianju Xu
Vadim Eksarevskiy
Jaliya Ekanayake
Emad Barsoum
116
10
0
04 Jun 2020
Inherent Noise in Gradient Based Methods
Inherent Noise in Gradient Based Methods
Arushi Gupta
121
0
0
26 May 2020
Learning Rate Annealing Can Provably Help Generalization, Even for
  Convex Problems
Learning Rate Annealing Can Provably Help Generalization, Even for Convex Problems
Preetum Nakkiran
MLT
160
23
0
15 May 2020
2kenize: Tying Subword Sequences for Chinese Script Conversion
2kenize: Tying Subword Sequences for Chinese Script Conversion
Pranav A
Isabelle Augenstein
193
1
0
07 May 2020
Dynamic backup workers for parallel machine learning
Dynamic backup workers for parallel machine learning
Chuan Xu
Giovanni Neglia
Nicola Sebastianelli
274
12
0
30 Apr 2020
The Impact of the Mini-batch Size on the Variance of Gradients in
  Stochastic Gradient Descent
The Impact of the Mini-batch Size on the Variance of Gradients in Stochastic Gradient Descent
Xin-Yao Qian
Diego Klabjan
ODL
147
40
0
27 Apr 2020
SIPA: A Simple Framework for Efficient Networks
SIPA: A Simple Framework for Efficient Networks
Gihun Lee
Sangmin Bae
Jaehoon Oh
Seyoung Yun
121
1
0
24 Apr 2020
Predicting the outputs of finite deep neural networks trained with noisy
  gradients
Predicting the outputs of finite deep neural networks trained with noisy gradientsPhysical Review E (PRE), 2020
Gadi Naveh
Oded Ben-David
H. Sompolinsky
Zohar Ringel
438
31
0
02 Apr 2020
Stochastic Proximal Gradient Algorithm with Minibatches. Application to
  Large Scale Learning Models
Stochastic Proximal Gradient Algorithm with Minibatches. Application to Large Scale Learning Models
A. Pătraşcu
C. Paduraru
Paul Irofti
122
0
0
30 Mar 2020
Understanding the Effects of Data Parallelism and Sparsity on Neural
  Network Training
Understanding the Effects of Data Parallelism and Sparsity on Neural Network Training
Namhoon Lee
Thalaiyasingam Ajanthan
Juil Sock
Martin Jaggi
207
2
0
25 Mar 2020
Robust and On-the-fly Dataset Denoising for Image Classification
Robust and On-the-fly Dataset Denoising for Image ClassificationEuropean Conference on Computer Vision (ECCV), 2020
Jiaming Song
Lunjia Hu
Michael Auli
Yann N. Dauphin
Tengyu Ma
NoLaOOD
186
13
0
24 Mar 2020
The Implicit Regularization of Stochastic Gradient Flow for Least
  Squares
The Implicit Regularization of Stochastic Gradient Flow for Least SquaresInternational Conference on Machine Learning (ICML), 2020
Alnur Ali
Guang Cheng
Robert Tibshirani
177
81
0
17 Mar 2020
Communication-Efficient Distributed Deep Learning: A Comprehensive
  Survey
Communication-Efficient Distributed Deep Learning: A Comprehensive Survey
Zhenheng Tang
Shaoshuai Shi
Wei Wang
Yue Liu
Xiaowen Chu
249
54
0
10 Mar 2020
AL2: Progressive Activation Loss for Learning General Representations in
  Classification Neural Networks
AL2: Progressive Activation Loss for Learning General Representations in Classification Neural NetworksIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2020
Majed El Helou
Frederike Dumbgen
Sabine Süsstrunk
CLLAI4CE
134
2
0
07 Mar 2020
Automatic Perturbation Analysis for Scalable Certified Robustness and
  Beyond
Automatic Perturbation Analysis for Scalable Certified Robustness and Beyond
Kaidi Xu
Zhouxing Shi
Huan Zhang
Yihan Wang
Kai-Wei Chang
Shiyu Huang
B. Kailkhura
Xinyu Lin
Cho-Jui Hsieh
AAML
313
15
0
28 Feb 2020
Batch Normalization Biases Residual Blocks Towards the Identity Function
  in Deep Networks
Batch Normalization Biases Residual Blocks Towards the Identity Function in Deep Networks
Soham De
Samuel L. Smith
ODL
248
20
0
24 Feb 2020
The Two Regimes of Deep Network Training
The Two Regimes of Deep Network Training
Guillaume Leclerc
Aleksander Madry
197
49
0
24 Feb 2020
Unique Properties of Flat Minima in Deep Networks
Unique Properties of Flat Minima in Deep NetworksInternational Conference on Machine Learning (ICML), 2020
Rotem Mulayoff
T. Michaeli
ODL
105
4
0
11 Feb 2020
Previous
123...1056789
Next
Page 6 of 10