v1v2 (latest)

On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima

15 September 2016

Papers citing "On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima"

50 / 1,653 papers shown

DataFreeShield: Defending Adversarial Attacks without Training Data

Sunjong Park

268

21 Jun 2024

Flat Posterior Does Matter For Bayesian Model Averaging

808

21 Jun 2024

Adaptive Adversarial Cross-Entropy Loss for Sharpness-Aware Minimization

Tanapat Ratchatorn

Masayuki Tanaka

AAML

282

20 Jun 2024

Information Guided Regularization for Fine-tuning Language Models

293

20 Jun 2024

Communication-Efficient Adaptive Batch Size Strategies for Distributed Local Gradient Methods

310

20 Jun 2024

DPO: Dual-Perturbation Optimization for Test-time Adaptation in 3D Object Detection

Zhuoxiao Chen

Zixin Wang

Yadan Luo

Sen Wang

Zi Huang

AAML 3DPC

211

19 Jun 2024

Low-Resource Machine Translation through the Lens of Personalized Federated LearningConference on Empirical Methods in Natural Language Processing (EMNLP), 2024

Chris Biemann

189

18 Jun 2024

How Neural Networks Learn the Support is an Implicit Regularization Effect of SGD

Pierfrancesco Beneventano

Andrea Pinto

Tomaso A. Poggio

MLT

281

17 Jun 2024

What Does Softmax Probability Tell Us about Classifiers Ranking Across Diverse Test Conditions?

309

14 Jun 2024

When Will Gradient Regularization Be Harmful?International Conference on Machine Learning (ICML), 2024

149

14 Jun 2024

Large Stepsize Gradient Descent for Non-Homogeneous Two-Layer Networks: Margin Improvement and Fast Optimization

348

12 Jun 2024

Probing Implicit Bias in Semi-gradient Q-learning: Visualizing the Effective Loss Landscapes via the Fokker--Planck Equation

Shuyu Yin

Fei Wen

Peilin Liu

Tao Luo

278

12 Jun 2024

Asymptotic Unbiased Sample Sampling to Speed Up Sharpness-Aware Minimization

Jiaxin Deng

Junbiao Pang

Baochang Zhang

489

12 Jun 2024

Agnostic Sharpness-Aware Minimization

Trung Le

397

11 Jun 2024

Stable Minima Cannot Overfit in Univariate ReLU Networks: Generalization by Large Step Sizes

Kaiqi Zhang

295

10 Jun 2024

Revisiting Catastrophic Forgetting in Large Language Model TuningConference on Empirical Methods in Natural Language Processing (EMNLP), 2024

Meng Fang

216

07 Jun 2024

Error Bounds of Supervised Classification from Information-Theoretic Perspective

Binchuan Qi

Wei Gong

Li Li

283

07 Jun 2024

Batch-in-Batch: a new adversarial training framework for initial perturbation and sample selection

Le Li

248

06 Jun 2024

BindGPT: A Scalable Framework for 3D Molecular Design via Language Modeling and Reinforcement Learning

Rim Shayakhmetov

216

06 Jun 2024

A Universal Class of Sharpness-Aware Minimization Algorithms

384

06 Jun 2024

Cyclic Sparse Training: Is it Enough?

449

04 Jun 2024

Understanding Token Probability Encoding in Output Embeddings

296

03 Jun 2024

Mixup Augmentation with Multiple Interpolations

335

03 Jun 2024

Improving Generalization and Convergence by Enhancing Implicit Regularization

269

31 May 2024

Sharpness-Aware Minimization Enhances Feature Quality via Balanced Learning

Jacob Mitchell Springer

Vaishnavh Nagarajan

Aditi Raghunathan

351

30 May 2024

Near Optimal Decentralized Optimization with Compression and Momentum Tracking

Rustem Islamov

Yuan Gao

Sebastian U. Stich

240

30 May 2024

Locally Estimated Global Perturbations are Better than Local Perturbations for Federated Sharpness-aware Minimization

Ziqing Fan

Shengchao Hu

Jiangchao Yao

292

29 May 2024

Domain-Inspired Sharpness-Aware Minimization Under Domain Shifts

Jiangchao Yao

276

29 May 2024

To FP8 and Back Again: Quantifying Reduced Precision Effects on LLM Training Stability

222

29 May 2024

Visualizing the loss landscape of Self-supervised Vision Transformer

232

28 May 2024

MMPareto: Boosting Multimodal Learning with Innocent Unimodal Assistance

Yake Wei

Di Hu

303

28 May 2024

Navigating the Safety Landscape: Measuring Risks in Finetuning Large Language Models

339

27 May 2024

MCGAN: Enhancing GAN Training with Regression-Based Generator Loss

611

27 May 2024

The Uncanny Valley: Exploring Adversarial Robustness from a Flatness Perspective

256

27 May 2024

Layer-Aware Analysis of Catastrophic Overfitting: Revealing the Pseudo-Robust Shortcut Dependency

Bo Han

396

25 May 2024

Does SGD really happen in tiny subspaces?

Minhak Song

Kwangjun Ahn

Chulhee Yun

510

25 May 2024

The Impact of Geometric Complexity on Neural Collapse in Transfer Learning

278

24 May 2024

Surge Phenomenon in Optimal Learning Rate and Batch Size ScalingNeural Information Processing Systems (NeurIPS), 2024

Xingwu Sun

...

296

23 May 2024

Worldwide Federated Training of Language Models

358

23 May 2024

Improving Generalization of Deep Neural Networks by Optimum ShiftingAAAI Conference on Artificial Intelligence (AAAI), 2024

178

23 May 2024

Deep linear networks for regression are implicitly regularized towards flat minima

Pierre Marion

Lénaic Chizat

ODL

307

22 May 2024

SADDLe: Sharpness-Aware Decentralized Deep Learning with Heterogeneous Data

363

22 May 2024

Exploring and Exploiting the Asymmetric Valley of Deep Neural Networks

Xin-Chun Li

308

21 May 2024

Visualizing, Rethinking, and Mining the Loss Landscape of Deep Neural Networks

344

21 May 2024

Two-Phase Dynamics of Interactions Explains the Starting Point of a DNN Learning Over-Fitted Features

364

16 May 2024

MGSER-SAM: Memory-Guided Soft Experience Replay with Sharpness-Aware Optimization for Enhanced Continual LearningIEEE International Joint Conference on Neural Network (IJCNN), 2024

Xingyu Li

Bo Tang

VLM CLL

174

15 May 2024

Why is SAM Robust to Label Noise?International Conference on Learning Representations (ICLR), 2024

310

06 May 2024

Loss Jump During Loss Switch in Solving PDEs with Neural NetworksCommunications in Computational Physics (Commun. Comput. Phys.), 2024

201

06 May 2024

A separability-based approach to quantifying generalization: which layer is best?

349

02 May 2024

PackVFL: Efficient HE Packing for Vertical Federated Learning

226

01 May 2024