v1v2 (latest)

Train longer, generalize better: closing the generalization gap in large batch training of neural networks

24 May 2017

Papers citing "Train longer, generalize better: closing the generalization gap in large batch training of neural networks"

50 / 465 papers shown

Spreeze: High-Throughput Parallel Reinforcement Learning Framework

Guang Chen

164

11 Dec 2023

BCN: Batch Channel Normalization for Image Classification

159

01 Dec 2023

LEOD: Label-Efficient Object Detection for Event CamerasComputer Vision and Pattern Recognition (CVPR), 2023

258

29 Nov 2023

Using Stochastic Gradient Descent to Smooth Nonconvex Functions: Analysis of Implicit Graduated Optimization with Optimal Noise Scheduling

Naoki Sato

Hideaki Iiduka

384

15 Nov 2023

Balance, Imbalance, and Rebalance: Understanding Robust Overfitting from a Minimax Game PerspectiveNeural Information Processing Systems (NeurIPS), 2023

282

30 Oct 2023

rTsfNet: a DNN model with Multi-head 3D Rotation and Time Series Feature Extraction for IMU-based Human Activity RecognitionProceedings of the ACM on Interactive Mobile Wearable and Ubiquitous Technologies (IMWUT), 2023

Yu Enokibori

194

30 Oct 2023

Stable and Interpretable Deep Learning for Tabular Data: Introducing InterpreTabNet with the Novel InterpreStability Metric

Shiyun Wa

Xinai Lu

Minjuan Wang

174

04 Oct 2023

YFlows: Systematic Dataflow Exploration and Code Generation for Efficient Neural Network Inference using SIMD Architectures on CPUsInternational Conference on Compiler Construction (CC), 2023

501

01 Oct 2023

Masked Autoencoders are Scalable Learners of Cellular Morphology

...

275

27 Sep 2023

Deep Model Fusion: A Survey

Liang Ding

Li Shen

306

27 Sep 2023

Revisiting LARS for Large Batch Training Generalization of Neural NetworksIEEE Transactions on Artificial Intelligence (IEEE TAI), 2023

354

25 Sep 2023

Accelerating Large Batch Training via Gradient Signal to Noise Ratio (GSNR)

221

24 Sep 2023

SlimPajama-DC: Understanding Data Combinations for LLM Training

...

442

19 Sep 2023

On the different regimes of Stochastic Gradient DescentProceedings of the National Academy of Sciences of the United States of America (PNAS), 2023

Antonio Sclocchi

Matthieu Wyart

387

19 Sep 2023

Neural Collapse for Unconstrained Feature Model under Cross-entropy Loss with Imbalanced DataJournal of machine learning research (JMLR), 2023

Wanli Hong

Shuyang Ling

223

18 Sep 2023

No Data Augmentation? Alternative Regularizations for Effective Training on Small Datasets

Lorenzo Brigato

Stavroula Mougiakakou

208

04 Sep 2023

On the Implicit Bias of AdamInternational Conference on Machine Learning (ICML), 2023

M. D. Cattaneo

Jason M. Klusowski

Boris Shigida

463

31 Aug 2023

FwdLLM: Efficient FedLLM using Forward Gradient

Mengwei Xu

255

26 Aug 2023

Enhancing Generalization of Universal Adversarial Perturbation through Gradient AggregationIEEE International Conference on Computer Vision (ICCV), 2023

296

11 Aug 2023

G-Mix: A Generalized Mixup Learning Framework Towards Flat MinimaIEEE Transactions on Artificial Intelligence (IEEE TAI), 2023

Xingyu Li

Bo Tang

AAML

214

07 Aug 2023

ApproBiVT: Lead ASR Models to Generalize Better Using Approximated Bias-Variance Tradeoff Guided Early Stopping and Checkpoint Averaging

155

05 Aug 2023

Eva: A General Vectorized Approximation Framework for Second-order Optimization

Lin Zhang

Shaoshuai Shi

Yue Liu

218

04 Aug 2023

GeneMask: Fast Pretraining of Gene Sequences to Enable Few-Shot LearningEuropean Conference on Artificial Intelligence (ECAI), 2023

200

29 Jul 2023

The instabilities of large learning rate training: a loss landscape view

Lawrence Wang

Stephen J. Roberts

155

22 Jul 2023

Addressing caveats of neural persistence with deep graph persistence

A. Sophia Koepke

388

20 Jul 2023

Accelerating Distributed ML Training via Selective SynchronizationIEEE International Conference on Cluster Computing (CLUSTER), 2023

S. Tyagi

Martin Swany

FedML

309

16 Jul 2023

CAME: Confidence-guided Adaptive Memory Efficient OptimizationAnnual Meeting of the Association for Computational Linguistics (ACL), 2023

Xin Jiang

Yang You

ODL

345

05 Jul 2023

A Neural Collapse Perspective on Feature Evolution in Graph Neural NetworksNeural Information Processing Systems (NeurIPS), 2023

Vignesh Kothapalli

Tom Tirer

Joan Bruna

267

04 Jul 2023

Black holes and the loss landscape in machine learningJournal of High Energy Physics (JHEP), 2023

P. Kumar

Taniya Mandal

Swapnamay Mondal

202

26 Jun 2023

Scaling MLPs: A Tale of Inductive BiasNeural Information Processing Systems (NeurIPS), 2023

Gregor Bachmann

Sotiris Anagnostidis

Thomas Hofmann

374

23 Jun 2023

DropCompute: simple and more robust distributed synchronous training via compute variance reductionNeural Information Processing Systems (NeurIPS), 2023

356

18 Jun 2023

Beyond Implicit Bias: The Insignificance of SGD Noise in Online LearningInternational Conference on Machine Learning (ICML), 2023

271

14 Jun 2023

Batches Stabilize the Minimum Norm Risk in High Dimensional Overparameterized Linear Regression

273

14 Jun 2023

Straggler-Resilient Decentralized Learning via Adaptive Asynchronous UpdatesACM Interational Symposium on Mobile Ad Hoc Networking and Computing (MobiHoc), 2023

248

11 Jun 2023

Anti-Correlated Noise in Epoch-Based Stochastic Gradient Descent: Implications for Weight Variances in Flat Directions

Marcel Kühn

B. Rosenow

364

08 Jun 2023

Normalization Layers Are All That Sharpness-Aware Minimization NeedsNeural Information Processing Systems (NeurIPS), 2023

Maximilian Mueller

Tiffany J. Vlaar

David Rolnick

Matthias Hein

283

07 Jun 2023

Decentralized SGD and Average-direction SAM are Asymptotically EquivalentInternational Conference on Machine Learning (ICML), 2023

652

05 Jun 2023

A Mathematical Abstraction for Balancing the Trade-off Between Creativity and Reality in Large Language Models

Ritwik Sinha

Zhao Song

Wanrong Zhu

263

04 Jun 2023

Stochastic Gradient Langevin Dynamics Based on Quantization with Increasing Resolution

Jinwuk Seok

Chang-Jae Cho

283

30 May 2023

SANE: The phases of gradient descent through Sharpness Adjusted Number of Effective parameters

Lawrence Wang

Stephen J. Roberts

242

29 May 2023

Ghost Noise for Regularizing Deep Neural NetworksAAAI Conference on Artificial Intelligence (AAAI), 2023

Atli Kosson

Dongyang Fan

Martin Jaggi

302

26 May 2023

Batch Model Consolidation: A Multi-Task Model Consolidation FrameworkComputer Vision and Pattern Recognition (CVPR), 2023

Iordanis Fostiropoulos

Jiaye Zhu

Laurent Itti

MoMe CLL

170

25 May 2023

On the Optimal Batch Size for Byzantine-Robust Distributed Learning

261

23 May 2023

Evolutionary Algorithms in the Light of SGD: Limit Equivalence, Minima Flatness, and Transfer Learning

Andrei Kucharavy

R. Guerraoui

Ljiljana Dolamic

228

20 May 2023

GeNAS: Neural Architecture Search with Better GeneralizationInternational Joint Conference on Artificial Intelligence (IJCAI), 2023

197

15 May 2023

Improving Stain Invariance of CNNs for Segmentation by Fusing Channel Attention and Domain-Adversarial TrainingInternational Conference on Medical Imaging with Deep Learning (MIDL), 2023

Kudaibergen Abutalip

Numan Saeed

Mustaqeem Khan

Abdulmotaleb El Saddik

144

22 Apr 2023

A Neural Network Transformer Model for Composite Microstructure HomogenizationEngineering applications of artificial intelligence (Eng. Appl. Artif. Intell.), 2023

Emil Pitz

K. Pochiraju

AI4CE

283

16 Apr 2023

Deep neural networks have an inbuilt Occam's razorNature Communications (Nat. Commun.), 2023

Chris Mingard

Henry Rees

Guillermo Valle Pérez

A. Louis

UQCV BDL

293

13 Apr 2023

SLowcal-SGD: Slow Query Points Improve Local-SGD for Stochastic Convex OptimizationNeural Information Processing Systems (NeurIPS), 2023

Kfir Y. Levy

FedML

253

09 Apr 2023

On Efficient Training of Large-Scale Deep Learning Models: A Literature Review

Li Shen

Liang Ding

296

07 Apr 2023