v1v2 (latest)

Train longer, generalize better: closing the generalization gap in large batch training of neural networks

24 May 2017

Papers citing "Train longer, generalize better: closing the generalization gap in large batch training of neural networks"

50 / 465 papers shown

PopulAtion Parameter Averaging (PAPA)

Alexia Jolicoeur-Martineau

491

06 Apr 2023

Doubly Stochastic Models: Learning with Unbiased Label Noises and Inference Stability

Haoyi Xiong

155

01 Apr 2023

Solving Regularized Exp, Cosh and Sinh Regression Problems

Zhihang Li

Zhao Song

Wanrong Zhu

211

28 Mar 2023

Improving Transformer Performance for French Clinical Notes Classification Using Mixture of Experts on a Limited DatasetIEEE Journal of Translational Engineering in Health and Medicine (IEEE JTEHM), 2023

347

22 Mar 2023

Lower Generalization Bounds for GD and SGD in Smooth Stochastic Convex Optimization

Peiyuan Zhang

Jiaye Teng

J.N. Zhang

308

19 Mar 2023

InfoBatch: Lossless Training Speed Up by Unbiased Dynamic Data PruningInternational Conference on Learning Representations (ICLR), 2023

Jianyang Gu

...

Yang You

344

08 Mar 2023

How to DP-fy ML: A Practical Guide to Machine Learning with Differential PrivacyJournal of Artificial Intelligence Research (JAIR), 2023

508

242

01 Mar 2023

On the Training Instability of Shuffling SGD with Batch NormalizationInternational Conference on Machine Learning (ICML), 2023

David Wu

Chulhee Yun

S. Sra

348

24 Feb 2023

MaxGNR: A Dynamic Weight Strategy via Maximizing Gradient-to-Noise Ratio for Multi-Task LearningAsian Conference on Computer Vision (ACCV), 2023

112

18 Feb 2023

(S)GD over Diagonal Linear Networks: Implicit Regularisation, Large Stepsizes and Edge of StabilityNeural Information Processing Systems (NeurIPS), 2023

351

17 Feb 2023

Unsupervised Learning of Initialization in Deep Neural Networks via Maximum Mean Discrepancy

Cheolhyoung Lee

Dong Wang

133

08 Feb 2023

Dissecting the Effects of SGD Noise in Distinct Regimes of Deep LearningInternational Conference on Machine Learning (ICML), 2023

Antonio Sclocchi

Mario Geiger

Matthieu Wyart

224

31 Jan 2023

StyleGAN-T: Unlocking the Power of GANs for Fast Large-Scale Text-to-Image SynthesisInternational Conference on Machine Learning (ICML), 2023

327

267

23 Jan 2023

Stability Analysis of Sharpness-Aware Minimization

173

16 Jan 2023

Disjoint Masking with Joint Distillation for Efficient Masked Image ModelingIEEE transactions on multimedia (IEEE TMM), 2022

Chunyu Xie

351

31 Dec 2022

Learning 3D Human Pose Estimation from Dozens of Datasets using a Geometry-Aware Autoencoder to Bridge Between Skeleton FormatsIEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2022

249

29 Dec 2022

Maximal Initial Learning Rates in Deep ReLU NetworksInternational Conference on Machine Learning (ICML), 2022

Gaurav M. Iyer

Boris Hanin

David Rolnick

290

14 Dec 2022

FedGPO: Heterogeneity-Aware Global Parameter Optimization for Efficient Federated LearningIEEE International Symposium on Workload Characterization (IISWC), 2022

Young Geun Kim

Carole-Jean Wu

FedML

237

30 Nov 2022

ModelDiff: A Framework for Comparing Learning AlgorithmsInternational Conference on Machine Learning (ICML), 2022

Harshay Shah

217

22 Nov 2022

Q-Ensemble for Offline RL: Don't Scale the Ensemble, Scale the Batch Size

273

20 Nov 2022

Two Facets of SDE Under an Information-Theoretic Lens: Generalization of SGD via Training Trajectories and via Terminal StatesConference on Uncertainty in Artificial Intelligence (UAI), 2022

Ziqiao Wang

Yongyi Mao

317

19 Nov 2022

MogaNet: Multi-order Gated Aggregation NetworkInternational Conference on Learning Representations (ICLR), 2022

285

125

07 Nov 2022

Class Interference of Deep Neural Networks

Dongcui Diao

Hengshuai Yao

Bei Jiang

134

31 Oct 2022

Perturbation Analysis of Neural CollapseInternational Conference on Machine Learning (ICML), 2022

283

29 Oct 2022

Deep Neural Networks as the Semi-classical Limit of Topological Quantum Neural Networks: The problem of generalisation

123

25 Oct 2022

A New Perspective for Understanding Generalization Gap of Deep Neural Networks Trained with Large Batch Sizes

O. Oyedotun

Konstantinos Papadopoulos

Djamila Aouada

AI4CE

275

21 Oct 2022

Fine-mixing: Mitigating Backdoors in Fine-tuned Language ModelsConference on Empirical Methods in Natural Language Processing (EMNLP), 2022

188

18 Oct 2022

AnalogVNN: A fully modular framework for modeling and optimizing photonic neural networksAPL Machine Learning (AML), 2022

Vivswan Shah

Nathan Youngblood

213

14 Oct 2022

Vision Transformers provably learn spatial structureNeural Information Processing Systems (NeurIPS), 2022

226

102

13 Oct 2022

MSRL: Distributed Reinforcement Learning with Dataflow FragmentsUSENIX Annual Technical Conference (USENIX ATC), 2022

208

03 Oct 2022

Shockwave: Fair and Efficient Cluster Scheduling for Dynamic Adaptation in Machine LearningSymposium on Networked Systems Design and Implementation (NSDI), 2022

Pengfei Zheng

Rui Pan

Tarannum Khan

Shivaram Venkataraman

Aditya Akella

263

30 Sep 2022

Why neural networks find simple solutions: the many regularizers of geometric complexityNeural Information Processing Systems (NeurIPS), 2022

359

27 Sep 2022

Rethinking Performance Gains in Image Dehazing Networks

178

23 Sep 2022

Batch Layer Normalization, A new normalization layer for CNNs and RNNInternational Conference on Advances in Artificial Intelligence (ICAAI), 2022

A. Ziaee

Erion cCano

161

19 Sep 2022

On the generalization of learning algorithms that do not convergeNeural Information Processing Systems (NeurIPS), 2022

380

16 Aug 2022

Zeus: Understanding and Optimizing GPU Energy Consumption of DNN TrainingSymposium on Networked Systems Design and Implementation (NSDI), 2022

Jie You

Jaehoon Chung

Mosharaf Chowdhury

295

126

12 Aug 2022

ILASR: Privacy-Preserving Incremental Learning for Automatic Speech Recognition at Production ScaleKnowledge Discovery and Data Mining (KDD), 2022

...

Prahalad Venkataramanan

Zheng Wu

Pankaj Sitpure

CLL

222

19 Jul 2022

Efficient Augmentation for Imbalanced Deep LearningIEEE International Conference on Data Engineering (ICDE), 2022

366

13 Jul 2022

Towards understanding how momentum improves generalization in deep learningInternational Conference on Machine Learning (ICML), 2022

Samy Jelassi

Yuanzhi Li

ODL MLT AI4CE

201

13 Jul 2022

Scalable K-FAC Training for Deep Neural Networks with Distributed PreconditioningIEEE Transactions on Cloud Computing (IEEE TCC), 2022

221

30 Jun 2022

Disentangling Model Multiplicity in Deep Learning

Ari Heljakka

Martin Trapp

Arno Solin

181

17 Jun 2022

Implicit Regularization or Implicit Conditioning? Exact Risk Trajectories of SGD in High DimensionsNeural Information Processing Systems (NeurIPS), 2022

143

15 Jun 2022

Understanding the Generalization Benefit of Normalization Layers: Sharpness ReductionNeural Information Processing Systems (NeurIPS), 2022

315

14 Jun 2022

Towards Understanding Sharpness-Aware MinimizationInternational Conference on Machine Learning (ICML), 2022

Maksym Andriushchenko

Nicolas Flammarion

AAML

317

178

13 Jun 2022

Modeling the Machine Learning MultiverseNeural Information Processing Systems (NeurIPS), 2022

251

13 Jun 2022

The Slingshot Mechanism: An Empirical Study of Adaptive Optimizers and the Grokking Phenomenon

255

10 Jun 2022

Improved two-stage hate speech classification for twitter based on Deep Neural Networks

Georgios K. Pitsilis

119

08 Jun 2022

Generalization Error Bounds for Deep Neural Networks Trained by SGD

Mingze Wang

Chao Ma

152

07 Jun 2022

Beyond accuracy: generalization properties of bio-plausible temporal credit assignment rulesNeural Information Processing Systems (NeurIPS), 2022

517

02 Jun 2022

Efficient Self-supervised Vision Pretraining with Local Masked Reconstruction

Jun Chen

Ming Hu

Boyang Albert Li

Mohamed Elhoseiny

341

01 Jun 2022