ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1705.08741
  4. Cited By
Train longer, generalize better: closing the generalization gap in large
  batch training of neural networks
v1v2 (latest)

Train longer, generalize better: closing the generalization gap in large batch training of neural networks

24 May 2017
Elad Hoffer
Itay Hubara
Daniel Soudry
    ODL
ArXiv (abs)PDFHTML

Papers citing "Train longer, generalize better: closing the generalization gap in large batch training of neural networks"

50 / 465 papers shown
PopulAtion Parameter Averaging (PAPA)
PopulAtion Parameter Averaging (PAPA)
Alexia Jolicoeur-Martineau
Emy Gervais
Kilian Fatras
Yan Zhang
Damien Scieur
MoMe
491
25
0
06 Apr 2023
Doubly Stochastic Models: Learning with Unbiased Label Noises and
  Inference Stability
Doubly Stochastic Models: Learning with Unbiased Label Noises and Inference Stability
Haoyi Xiong
Xuhong Li
Bo Yu
Zhanxing Zhu
Dongrui Wu
Dejing Dou
NoLa
155
0
0
01 Apr 2023
Solving Regularized Exp, Cosh and Sinh Regression Problems
Solving Regularized Exp, Cosh and Sinh Regression Problems
Zhihang Li
Zhao Song
Wanrong Zhu
211
41
0
28 Mar 2023
Improving Transformer Performance for French Clinical Notes Classification Using Mixture of Experts on a Limited Dataset
Improving Transformer Performance for French Clinical Notes Classification Using Mixture of Experts on a Limited DatasetIEEE Journal of Translational Engineering in Health and Medicine (IEEE JTEHM), 2023
Thanh-Dung Le
P. Jouvet
R. Noumeir
MoEMedIm
347
8
0
22 Mar 2023
Lower Generalization Bounds for GD and SGD in Smooth Stochastic Convex
  Optimization
Lower Generalization Bounds for GD and SGD in Smooth Stochastic Convex Optimization
Peiyuan Zhang
Jiaye Teng
J.N. Zhang
308
5
0
19 Mar 2023
InfoBatch: Lossless Training Speed Up by Unbiased Dynamic Data Pruning
InfoBatch: Lossless Training Speed Up by Unbiased Dynamic Data PruningInternational Conference on Learning Representations (ICLR), 2023
Ziheng Qin
Kaidi Wang
Zangwei Zheng
Jianyang Gu
Xiang Peng
...
Daquan Zhou
Lei Shang
Baigui Sun
Xuansong Xie
Yang You
344
77
0
08 Mar 2023
How to DP-fy ML: A Practical Guide to Machine Learning with Differential
  Privacy
How to DP-fy ML: A Practical Guide to Machine Learning with Differential PrivacyJournal of Artificial Intelligence Research (JAIR), 2023
Natalia Ponomareva
Hussein Hazimeh
Alexey Kurakin
Zheng Xu
Carson E. Denison
H. B. McMahan
Sergei Vassilvitskii
Steve Chien
Abhradeep Thakurta
508
242
0
01 Mar 2023
On the Training Instability of Shuffling SGD with Batch Normalization
On the Training Instability of Shuffling SGD with Batch NormalizationInternational Conference on Machine Learning (ICML), 2023
David Wu
Chulhee Yun
S. Sra
348
6
0
24 Feb 2023
MaxGNR: A Dynamic Weight Strategy via Maximizing Gradient-to-Noise Ratio
  for Multi-Task Learning
MaxGNR: A Dynamic Weight Strategy via Maximizing Gradient-to-Noise Ratio for Multi-Task LearningAsian Conference on Computer Vision (ACCV), 2023
Caoyun Fan
Wenqing Chen
Jidong Tian
Yitian Li
Hao He
Yaohui Jin
112
4
0
18 Feb 2023
(S)GD over Diagonal Linear Networks: Implicit Regularisation, Large
  Stepsizes and Edge of Stability
(S)GD over Diagonal Linear Networks: Implicit Regularisation, Large Stepsizes and Edge of StabilityNeural Information Processing Systems (NeurIPS), 2023
Mathieu Even
Scott Pesme
Suriya Gunasekar
Nicolas Flammarion
351
20
0
17 Feb 2023
Unsupervised Learning of Initialization in Deep Neural Networks via
  Maximum Mean Discrepancy
Unsupervised Learning of Initialization in Deep Neural Networks via Maximum Mean Discrepancy
Cheolhyoung Lee
Dong Wang
133
0
0
08 Feb 2023
Dissecting the Effects of SGD Noise in Distinct Regimes of Deep Learning
Dissecting the Effects of SGD Noise in Distinct Regimes of Deep LearningInternational Conference on Machine Learning (ICML), 2023
Antonio Sclocchi
Mario Geiger
Matthieu Wyart
224
7
0
31 Jan 2023
StyleGAN-T: Unlocking the Power of GANs for Fast Large-Scale
  Text-to-Image Synthesis
StyleGAN-T: Unlocking the Power of GANs for Fast Large-Scale Text-to-Image SynthesisInternational Conference on Machine Learning (ICML), 2023
Axel Sauer
Tero Karras
S. Laine
Andreas Geiger
Timo Aila
327
267
0
23 Jan 2023
Stability Analysis of Sharpness-Aware Minimization
Stability Analysis of Sharpness-Aware Minimization
Hoki Kim
Jinseong Park
Yujin Choi
Jaewook Lee
173
15
0
16 Jan 2023
Disjoint Masking with Joint Distillation for Efficient Masked Image
  Modeling
Disjoint Masking with Joint Distillation for Efficient Masked Image ModelingIEEE transactions on multimedia (IEEE TMM), 2022
Xin Ma
Yu Xie
Chunyu Xie
Long Ye
Yafeng Deng
Xiang Ji
351
16
0
31 Dec 2022
Learning 3D Human Pose Estimation from Dozens of Datasets using a
  Geometry-Aware Autoencoder to Bridge Between Skeleton Formats
Learning 3D Human Pose Estimation from Dozens of Datasets using a Geometry-Aware Autoencoder to Bridge Between Skeleton FormatsIEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2022
István Sárándi
Alexander Hermans
Bastian Leibe
3DH
249
50
0
29 Dec 2022
Maximal Initial Learning Rates in Deep ReLU Networks
Maximal Initial Learning Rates in Deep ReLU NetworksInternational Conference on Machine Learning (ICML), 2022
Gaurav M. Iyer
Boris Hanin
David Rolnick
290
14
0
14 Dec 2022
FedGPO: Heterogeneity-Aware Global Parameter Optimization for Efficient
  Federated Learning
FedGPO: Heterogeneity-Aware Global Parameter Optimization for Efficient Federated LearningIEEE International Symposium on Workload Characterization (IISWC), 2022
Young Geun Kim
Carole-Jean Wu
FedML
237
5
0
30 Nov 2022
ModelDiff: A Framework for Comparing Learning Algorithms
ModelDiff: A Framework for Comparing Learning AlgorithmsInternational Conference on Machine Learning (ICML), 2022
Harshay Shah
Sung Min Park
Andrew Ilyas
Aleksander Madry
SyDa
217
34
0
22 Nov 2022
Q-Ensemble for Offline RL: Don't Scale the Ensemble, Scale the Batch
  Size
Q-Ensemble for Offline RL: Don't Scale the Ensemble, Scale the Batch Size
Alexander Nikulin
Vladislav Kurenkov
Denis Tarasov
Dmitry Akimov
Sergey Kolesnikov
OffRL
273
19
0
20 Nov 2022
Two Facets of SDE Under an Information-Theoretic Lens: Generalization of
  SGD via Training Trajectories and via Terminal States
Two Facets of SDE Under an Information-Theoretic Lens: Generalization of SGD via Training Trajectories and via Terminal StatesConference on Uncertainty in Artificial Intelligence (UAI), 2022
Ziqiao Wang
Yongyi Mao
317
12
0
19 Nov 2022
MogaNet: Multi-order Gated Aggregation Network
MogaNet: Multi-order Gated Aggregation NetworkInternational Conference on Learning Representations (ICLR), 2022
Siyuan Li
Zedong Wang
Zicheng Liu
Cheng Tan
Haitao Lin
Di Wu
Zhiyuan Chen
Jiangbin Zheng
Stan Z. Li
285
125
0
07 Nov 2022
Class Interference of Deep Neural Networks
Class Interference of Deep Neural Networks
Dongcui Diao
Hengshuai Yao
Bei Jiang
134
1
0
31 Oct 2022
Perturbation Analysis of Neural Collapse
Perturbation Analysis of Neural CollapseInternational Conference on Machine Learning (ICML), 2022
Tom Tirer
Haoxiang Huang
Jonathan Niles-Weed
AAML
283
31
0
29 Oct 2022
Deep Neural Networks as the Semi-classical Limit of Topological Quantum
  Neural Networks: The problem of generalisation
Deep Neural Networks as the Semi-classical Limit of Topological Quantum Neural Networks: The problem of generalisation
A. Marcianò
De-Wei Chen
Filippo Fabrocini
C. Fields
M. Lulli
Emanuele Zappala
GNN
123
6
0
25 Oct 2022
A New Perspective for Understanding Generalization Gap of Deep Neural
  Networks Trained with Large Batch Sizes
A New Perspective for Understanding Generalization Gap of Deep Neural Networks Trained with Large Batch Sizes
O. Oyedotun
Konstantinos Papadopoulos
Djamila Aouada
AI4CE
275
19
0
21 Oct 2022
Fine-mixing: Mitigating Backdoors in Fine-tuned Language Models
Fine-mixing: Mitigating Backdoors in Fine-tuned Language ModelsConference on Empirical Methods in Natural Language Processing (EMNLP), 2022
Zhiyuan Zhang
Lingjuan Lyu
Jiabo He
Chenguang Wang
Xu Sun
AAML
188
58
0
18 Oct 2022
AnalogVNN: A fully modular framework for modeling and optimizing
  photonic neural networks
AnalogVNN: A fully modular framework for modeling and optimizing photonic neural networksAPL Machine Learning (AML), 2022
Vivswan Shah
Nathan Youngblood
213
6
0
14 Oct 2022
Vision Transformers provably learn spatial structure
Vision Transformers provably learn spatial structureNeural Information Processing Systems (NeurIPS), 2022
Samy Jelassi
Michael E. Sander
Yuan-Fang Li
ViTMLT
226
102
0
13 Oct 2022
MSRL: Distributed Reinforcement Learning with Dataflow Fragments
MSRL: Distributed Reinforcement Learning with Dataflow FragmentsUSENIX Annual Technical Conference (USENIX ATC), 2022
Huanzhou Zhu
Bo Zhao
Gang Chen
Weifeng Chen
Yijie Chen
Liang Shi
Yaodong Yang
Peter R. Pietzuch
Lei Chen
OffRLMoE
208
8
0
03 Oct 2022
Shockwave: Fair and Efficient Cluster Scheduling for Dynamic Adaptation
  in Machine Learning
Shockwave: Fair and Efficient Cluster Scheduling for Dynamic Adaptation in Machine LearningSymposium on Networked Systems Design and Implementation (NSDI), 2022
Pengfei Zheng
Rui Pan
Tarannum Khan
Shivaram Venkataraman
Aditya Akella
263
34
0
30 Sep 2022
Why neural networks find simple solutions: the many regularizers of
  geometric complexity
Why neural networks find simple solutions: the many regularizers of geometric complexityNeural Information Processing Systems (NeurIPS), 2022
Benoit Dherin
Michael Munn
M. Rosca
David Barrett
359
44
0
27 Sep 2022
Rethinking Performance Gains in Image Dehazing Networks
Rethinking Performance Gains in Image Dehazing Networks
Yuda Song
Yang Zhou
Hui Qian
Xin Du
SSeg
178
71
0
23 Sep 2022
Batch Layer Normalization, A new normalization layer for CNNs and RNN
Batch Layer Normalization, A new normalization layer for CNNs and RNNInternational Conference on Advances in Artificial Intelligence (ICAAI), 2022
A. Ziaee
Erion cCano
161
23
0
19 Sep 2022
On the generalization of learning algorithms that do not converge
On the generalization of learning algorithms that do not convergeNeural Information Processing Systems (NeurIPS), 2022
N. Chandramoorthy
Andreas Loukas
Khashayar Gatmiry
Stefanie Jegelka
MLT
380
12
0
16 Aug 2022
Zeus: Understanding and Optimizing GPU Energy Consumption of DNN
  Training
Zeus: Understanding and Optimizing GPU Energy Consumption of DNN TrainingSymposium on Networked Systems Design and Implementation (NSDI), 2022
Jie You
Jaehoon Chung
Mosharaf Chowdhury
295
126
0
12 Aug 2022
ILASR: Privacy-Preserving Incremental Learning for Automatic Speech
  Recognition at Production Scale
ILASR: Privacy-Preserving Incremental Learning for Automatic Speech Recognition at Production ScaleKnowledge Discovery and Data Mining (KDD), 2022
Gopinath Chennupati
Milind Rao
Gurpreet Chadha
Aaron Eakin
A. Raju
...
Andrew Oberlin
Buddha Nandanoor
Prahalad Venkataramanan
Zheng Wu
Pankaj Sitpure
CLL
222
9
0
19 Jul 2022
Efficient Augmentation for Imbalanced Deep Learning
Efficient Augmentation for Imbalanced Deep LearningIEEE International Conference on Data Engineering (ICDE), 2022
Damien Dablain
C. Bellinger
Bartosz Krawczyk
Nitesh Chawla
366
12
0
13 Jul 2022
Towards understanding how momentum improves generalization in deep
  learning
Towards understanding how momentum improves generalization in deep learningInternational Conference on Machine Learning (ICML), 2022
Samy Jelassi
Yuanzhi Li
ODLMLTAI4CE
201
46
0
13 Jul 2022
Scalable K-FAC Training for Deep Neural Networks with Distributed
  Preconditioning
Scalable K-FAC Training for Deep Neural Networks with Distributed PreconditioningIEEE Transactions on Cloud Computing (IEEE TCC), 2022
Lin Zhang
Shaoshuai Shi
Wei Wang
Yue Liu
221
11
0
30 Jun 2022
Disentangling Model Multiplicity in Deep Learning
Disentangling Model Multiplicity in Deep Learning
Ari Heljakka
Martin Trapp
Arno Solin
Arno Solin
181
6
0
17 Jun 2022
Implicit Regularization or Implicit Conditioning? Exact Risk
  Trajectories of SGD in High Dimensions
Implicit Regularization or Implicit Conditioning? Exact Risk Trajectories of SGD in High DimensionsNeural Information Processing Systems (NeurIPS), 2022
Courtney Paquette
Elliot Paquette
Ben Adlam
Jeffrey Pennington
143
19
0
15 Jun 2022
Understanding the Generalization Benefit of Normalization Layers:
  Sharpness Reduction
Understanding the Generalization Benefit of Normalization Layers: Sharpness ReductionNeural Information Processing Systems (NeurIPS), 2022
Kaifeng Lyu
Zhiyuan Li
Sanjeev Arora
FAtt
315
87
0
14 Jun 2022
Towards Understanding Sharpness-Aware Minimization
Towards Understanding Sharpness-Aware MinimizationInternational Conference on Machine Learning (ICML), 2022
Maksym Andriushchenko
Nicolas Flammarion
AAML
317
178
0
13 Jun 2022
Modeling the Machine Learning Multiverse
Modeling the Machine Learning MultiverseNeural Information Processing Systems (NeurIPS), 2022
Samuel J. Bell
Onno P. Kampman
Jesse Dodge
Neil D. Lawrence
251
21
0
13 Jun 2022
The Slingshot Mechanism: An Empirical Study of Adaptive Optimizers and
  the Grokking Phenomenon
The Slingshot Mechanism: An Empirical Study of Adaptive Optimizers and the Grokking Phenomenon
Vimal Thilak
Etai Littwin
Shuangfei Zhai
Omid Saremi
Roni Paiss
J. Susskind
255
74
0
10 Jun 2022
Improved two-stage hate speech classification for twitter based on Deep
  Neural Networks
Improved two-stage hate speech classification for twitter based on Deep Neural Networks
Georgios K. Pitsilis
119
0
0
08 Jun 2022
Generalization Error Bounds for Deep Neural Networks Trained by SGD
Generalization Error Bounds for Deep Neural Networks Trained by SGD
Mingze Wang
Chao Ma
152
22
0
07 Jun 2022
Beyond accuracy: generalization properties of bio-plausible temporal
  credit assignment rules
Beyond accuracy: generalization properties of bio-plausible temporal credit assignment rulesNeural Information Processing Systems (NeurIPS), 2022
Yuhan Helena Liu
Arna Ghosh
Blake A. Richards
E. Shea-Brown
Guillaume Lajoie
517
10
0
02 Jun 2022
Efficient Self-supervised Vision Pretraining with Local Masked Reconstruction
Efficient Self-supervised Vision Pretraining with Local Masked Reconstruction
Jun Chen
Ming Hu
Boyang Albert Li
Mohamed Elhoseiny
341
40
0
01 Jun 2022
Previous
123456...8910
Next
Page 3 of 10