Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
1705.08741
Cited By
v1
v2 (latest)
Train longer, generalize better: closing the generalization gap in large batch training of neural networks
24 May 2017
Elad Hoffer
Itay Hubara
Daniel Soudry
ODL
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Train longer, generalize better: closing the generalization gap in large batch training of neural networks"
50 / 465 papers shown
PopulAtion Parameter Averaging (PAPA)
Alexia Jolicoeur-Martineau
Emy Gervais
Kilian Fatras
Yan Zhang
Damien Scieur
MoMe
491
25
0
06 Apr 2023
Doubly Stochastic Models: Learning with Unbiased Label Noises and Inference Stability
Haoyi Xiong
Xuhong Li
Bo Yu
Zhanxing Zhu
Dongrui Wu
Dejing Dou
NoLa
155
0
0
01 Apr 2023
Solving Regularized Exp, Cosh and Sinh Regression Problems
Zhihang Li
Zhao Song
Wanrong Zhu
211
41
0
28 Mar 2023
Improving Transformer Performance for French Clinical Notes Classification Using Mixture of Experts on a Limited Dataset
IEEE Journal of Translational Engineering in Health and Medicine (IEEE JTEHM), 2023
Thanh-Dung Le
P. Jouvet
R. Noumeir
MoE
MedIm
347
8
0
22 Mar 2023
Lower Generalization Bounds for GD and SGD in Smooth Stochastic Convex Optimization
Peiyuan Zhang
Jiaye Teng
J.N. Zhang
308
5
0
19 Mar 2023
InfoBatch: Lossless Training Speed Up by Unbiased Dynamic Data Pruning
International Conference on Learning Representations (ICLR), 2023
Ziheng Qin
Kaidi Wang
Zangwei Zheng
Jianyang Gu
Xiang Peng
...
Daquan Zhou
Lei Shang
Baigui Sun
Xuansong Xie
Yang You
344
77
0
08 Mar 2023
How to DP-fy ML: A Practical Guide to Machine Learning with Differential Privacy
Journal of Artificial Intelligence Research (JAIR), 2023
Natalia Ponomareva
Hussein Hazimeh
Alexey Kurakin
Zheng Xu
Carson E. Denison
H. B. McMahan
Sergei Vassilvitskii
Steve Chien
Abhradeep Thakurta
508
242
0
01 Mar 2023
On the Training Instability of Shuffling SGD with Batch Normalization
International Conference on Machine Learning (ICML), 2023
David Wu
Chulhee Yun
S. Sra
348
6
0
24 Feb 2023
MaxGNR: A Dynamic Weight Strategy via Maximizing Gradient-to-Noise Ratio for Multi-Task Learning
Asian Conference on Computer Vision (ACCV), 2023
Caoyun Fan
Wenqing Chen
Jidong Tian
Yitian Li
Hao He
Yaohui Jin
112
4
0
18 Feb 2023
(S)GD over Diagonal Linear Networks: Implicit Regularisation, Large Stepsizes and Edge of Stability
Neural Information Processing Systems (NeurIPS), 2023
Mathieu Even
Scott Pesme
Suriya Gunasekar
Nicolas Flammarion
351
20
0
17 Feb 2023
Unsupervised Learning of Initialization in Deep Neural Networks via Maximum Mean Discrepancy
Cheolhyoung Lee
Dong Wang
133
0
0
08 Feb 2023
Dissecting the Effects of SGD Noise in Distinct Regimes of Deep Learning
International Conference on Machine Learning (ICML), 2023
Antonio Sclocchi
Mario Geiger
Matthieu Wyart
224
7
0
31 Jan 2023
StyleGAN-T: Unlocking the Power of GANs for Fast Large-Scale Text-to-Image Synthesis
International Conference on Machine Learning (ICML), 2023
Axel Sauer
Tero Karras
S. Laine
Andreas Geiger
Timo Aila
327
267
0
23 Jan 2023
Stability Analysis of Sharpness-Aware Minimization
Hoki Kim
Jinseong Park
Yujin Choi
Jaewook Lee
173
15
0
16 Jan 2023
Disjoint Masking with Joint Distillation for Efficient Masked Image Modeling
IEEE transactions on multimedia (IEEE TMM), 2022
Xin Ma
Yu Xie
Chunyu Xie
Long Ye
Yafeng Deng
Xiang Ji
351
16
0
31 Dec 2022
Learning 3D Human Pose Estimation from Dozens of Datasets using a Geometry-Aware Autoencoder to Bridge Between Skeleton Formats
IEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2022
István Sárándi
Alexander Hermans
Bastian Leibe
3DH
249
50
0
29 Dec 2022
Maximal Initial Learning Rates in Deep ReLU Networks
International Conference on Machine Learning (ICML), 2022
Gaurav M. Iyer
Boris Hanin
David Rolnick
290
14
0
14 Dec 2022
FedGPO: Heterogeneity-Aware Global Parameter Optimization for Efficient Federated Learning
IEEE International Symposium on Workload Characterization (IISWC), 2022
Young Geun Kim
Carole-Jean Wu
FedML
237
5
0
30 Nov 2022
ModelDiff: A Framework for Comparing Learning Algorithms
International Conference on Machine Learning (ICML), 2022
Harshay Shah
Sung Min Park
Andrew Ilyas
Aleksander Madry
SyDa
217
34
0
22 Nov 2022
Q-Ensemble for Offline RL: Don't Scale the Ensemble, Scale the Batch Size
Alexander Nikulin
Vladislav Kurenkov
Denis Tarasov
Dmitry Akimov
Sergey Kolesnikov
OffRL
273
19
0
20 Nov 2022
Two Facets of SDE Under an Information-Theoretic Lens: Generalization of SGD via Training Trajectories and via Terminal States
Conference on Uncertainty in Artificial Intelligence (UAI), 2022
Ziqiao Wang
Yongyi Mao
317
12
0
19 Nov 2022
MogaNet: Multi-order Gated Aggregation Network
International Conference on Learning Representations (ICLR), 2022
Siyuan Li
Zedong Wang
Zicheng Liu
Cheng Tan
Haitao Lin
Di Wu
Zhiyuan Chen
Jiangbin Zheng
Stan Z. Li
285
125
0
07 Nov 2022
Class Interference of Deep Neural Networks
Dongcui Diao
Hengshuai Yao
Bei Jiang
134
1
0
31 Oct 2022
Perturbation Analysis of Neural Collapse
International Conference on Machine Learning (ICML), 2022
Tom Tirer
Haoxiang Huang
Jonathan Niles-Weed
AAML
283
31
0
29 Oct 2022
Deep Neural Networks as the Semi-classical Limit of Topological Quantum Neural Networks: The problem of generalisation
A. Marcianò
De-Wei Chen
Filippo Fabrocini
C. Fields
M. Lulli
Emanuele Zappala
GNN
123
6
0
25 Oct 2022
A New Perspective for Understanding Generalization Gap of Deep Neural Networks Trained with Large Batch Sizes
O. Oyedotun
Konstantinos Papadopoulos
Djamila Aouada
AI4CE
275
19
0
21 Oct 2022
Fine-mixing: Mitigating Backdoors in Fine-tuned Language Models
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2022
Zhiyuan Zhang
Lingjuan Lyu
Jiabo He
Chenguang Wang
Xu Sun
AAML
188
58
0
18 Oct 2022
AnalogVNN: A fully modular framework for modeling and optimizing photonic neural networks
APL Machine Learning (AML), 2022
Vivswan Shah
Nathan Youngblood
213
6
0
14 Oct 2022
Vision Transformers provably learn spatial structure
Neural Information Processing Systems (NeurIPS), 2022
Samy Jelassi
Michael E. Sander
Yuan-Fang Li
ViT
MLT
226
102
0
13 Oct 2022
MSRL: Distributed Reinforcement Learning with Dataflow Fragments
USENIX Annual Technical Conference (USENIX ATC), 2022
Huanzhou Zhu
Bo Zhao
Gang Chen
Weifeng Chen
Yijie Chen
Liang Shi
Yaodong Yang
Peter R. Pietzuch
Lei Chen
OffRL
MoE
208
8
0
03 Oct 2022
Shockwave: Fair and Efficient Cluster Scheduling for Dynamic Adaptation in Machine Learning
Symposium on Networked Systems Design and Implementation (NSDI), 2022
Pengfei Zheng
Rui Pan
Tarannum Khan
Shivaram Venkataraman
Aditya Akella
263
34
0
30 Sep 2022
Why neural networks find simple solutions: the many regularizers of geometric complexity
Neural Information Processing Systems (NeurIPS), 2022
Benoit Dherin
Michael Munn
M. Rosca
David Barrett
359
44
0
27 Sep 2022
Rethinking Performance Gains in Image Dehazing Networks
Yuda Song
Yang Zhou
Hui Qian
Xin Du
SSeg
178
71
0
23 Sep 2022
Batch Layer Normalization, A new normalization layer for CNNs and RNN
International Conference on Advances in Artificial Intelligence (ICAAI), 2022
A. Ziaee
Erion cCano
161
23
0
19 Sep 2022
On the generalization of learning algorithms that do not converge
Neural Information Processing Systems (NeurIPS), 2022
N. Chandramoorthy
Andreas Loukas
Khashayar Gatmiry
Stefanie Jegelka
MLT
380
12
0
16 Aug 2022
Zeus: Understanding and Optimizing GPU Energy Consumption of DNN Training
Symposium on Networked Systems Design and Implementation (NSDI), 2022
Jie You
Jaehoon Chung
Mosharaf Chowdhury
295
126
0
12 Aug 2022
ILASR: Privacy-Preserving Incremental Learning for Automatic Speech Recognition at Production Scale
Knowledge Discovery and Data Mining (KDD), 2022
Gopinath Chennupati
Milind Rao
Gurpreet Chadha
Aaron Eakin
A. Raju
...
Andrew Oberlin
Buddha Nandanoor
Prahalad Venkataramanan
Zheng Wu
Pankaj Sitpure
CLL
222
9
0
19 Jul 2022
Efficient Augmentation for Imbalanced Deep Learning
IEEE International Conference on Data Engineering (ICDE), 2022
Damien Dablain
C. Bellinger
Bartosz Krawczyk
Nitesh Chawla
366
12
0
13 Jul 2022
Towards understanding how momentum improves generalization in deep learning
International Conference on Machine Learning (ICML), 2022
Samy Jelassi
Yuanzhi Li
ODL
MLT
AI4CE
201
46
0
13 Jul 2022
Scalable K-FAC Training for Deep Neural Networks with Distributed Preconditioning
IEEE Transactions on Cloud Computing (IEEE TCC), 2022
Lin Zhang
Shaoshuai Shi
Wei Wang
Yue Liu
221
11
0
30 Jun 2022
Disentangling Model Multiplicity in Deep Learning
Ari Heljakka
Martin Trapp
Arno Solin
Arno Solin
181
6
0
17 Jun 2022
Implicit Regularization or Implicit Conditioning? Exact Risk Trajectories of SGD in High Dimensions
Neural Information Processing Systems (NeurIPS), 2022
Courtney Paquette
Elliot Paquette
Ben Adlam
Jeffrey Pennington
143
19
0
15 Jun 2022
Understanding the Generalization Benefit of Normalization Layers: Sharpness Reduction
Neural Information Processing Systems (NeurIPS), 2022
Kaifeng Lyu
Zhiyuan Li
Sanjeev Arora
FAtt
315
87
0
14 Jun 2022
Towards Understanding Sharpness-Aware Minimization
International Conference on Machine Learning (ICML), 2022
Maksym Andriushchenko
Nicolas Flammarion
AAML
317
178
0
13 Jun 2022
Modeling the Machine Learning Multiverse
Neural Information Processing Systems (NeurIPS), 2022
Samuel J. Bell
Onno P. Kampman
Jesse Dodge
Neil D. Lawrence
251
21
0
13 Jun 2022
The Slingshot Mechanism: An Empirical Study of Adaptive Optimizers and the Grokking Phenomenon
Vimal Thilak
Etai Littwin
Shuangfei Zhai
Omid Saremi
Roni Paiss
J. Susskind
255
74
0
10 Jun 2022
Improved two-stage hate speech classification for twitter based on Deep Neural Networks
Georgios K. Pitsilis
119
0
0
08 Jun 2022
Generalization Error Bounds for Deep Neural Networks Trained by SGD
Mingze Wang
Chao Ma
152
22
0
07 Jun 2022
Beyond accuracy: generalization properties of bio-plausible temporal credit assignment rules
Neural Information Processing Systems (NeurIPS), 2022
Yuhan Helena Liu
Arna Ghosh
Blake A. Richards
E. Shea-Brown
Guillaume Lajoie
517
10
0
02 Jun 2022
Efficient Self-supervised Vision Pretraining with Local Masked Reconstruction
Jun Chen
Ming Hu
Boyang Albert Li
Mohamed Elhoseiny
341
40
0
01 Jun 2022
Previous
1
2
3
4
5
6
...
8
9
10
Next
Page 3 of 10