v1v2 (latest)

On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima

15 September 2016

Papers citing "On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima"

50 / 1,654 papers shown

Implicit Bias in Noisy-SGD: With Applications to Differentially Private Training

Tom Sander

Maxime Sylvestre

Alain Durmus

214

13 Feb 2024

Game of Trojans: Adaptive Adversaries Against Output-based Trojaned-Model Detectors

Bhaskar Ramasubramanian

Bo Li

Radha Poovendran

AAML

217

12 Feb 2024

AdaBatchGrad: Combining Adaptive Batch Size and Adaptive Step Size

Alexander Gasnikov

224

07 Feb 2024

Strong convexity-guided hyper-parameter optimization for flatter losses

Rahul Yedida

Snehanshu Saha

376

07 Feb 2024

Curvature-Informed SGD via General Purpose Lie-Group Preconditioners

Omead Brandon Pooladzandi

Xi-Lin Li

269

07 Feb 2024

Subsampling is not Magic: Why Large Batch Sizes Work for Differentially Private Stochastic Optimisation

Ossi Raisa

Hibiki Ito

Antti Honkela

289

06 Feb 2024

Deconstructing the Goldilocks Zone of Neural Network InitializationInternational Conference on Machine Learning (ICML), 2024

Artem Vysogorets

Anna Dawid

Julia Kempe

260

05 Feb 2024

PanGu-

π

Pro:Rethinking Optimization and Architecture for Tiny Language Models

158

05 Feb 2024

Momentum Does Not Reduce Stochastic Noise in Stochastic Gradient Descent

Naoki Sato

Hideaki Iiduka

ODL

466

04 Feb 2024

BackdoorBench: A Comprehensive Benchmark and Analysis of Backdoor LearningInternational Journal of Computer Vision (IJCV), 2024

Hongrui Chen

327

26 Jan 2024

Catch-Up Mix: Catch-Up Class for Struggling Filters in CNNAAAI Conference on Artificial Intelligence (AAAI), 2024

Minsoo Kang

Minkoo Kang

Suhyun Kim

130

24 Jan 2024

DALex: Lexicase-like Selection via Diverse AggregationEuropean Conference on Genetic Programming (EuroGP), 2024

Andrew Ni

Lijie Ding

Lee Spector

264

23 Jan 2024

A Precise Characterization of SGD Stability Using Loss Surface GeometryInternational Conference on Learning Representations (ICLR), 2024

251

22 Jan 2024

Cheap Learning: Maximising Performance of Language Models for Social Data Science Using Minimal Data

Leonardo Castro-Gonzalez

322

22 Jan 2024

Momentum-SAM: Sharpness Aware Minimization without Computational Overhead

Marlon Becker

Frederick Altrock

Benjamin Risse

511

22 Jan 2024

Understanding the Generalization Benefits of Late Learning Rate DecayInternational Conference on Artificial Intelligence and Statistics (AISTATS), 2024

Yinuo Ren

Chao Ma

Lexing Ying

AI4CE

284

21 Jan 2024

The Surprising Harmfulness of Benign Overfitting for Adversarial Robustness

Yifan Hao

Tong Zhang

AAML

596

19 Jan 2024

Improving OCR Quality in 19th Century Historical Documents Using a Combined Machine Learning Based Approach

David Fleischhacker

Wolfgang Goederle

Roman Kern

134

15 Jan 2024

Stabilizing Sharpness-aware Minimization Through A Simple Renormalization Strategy

332

14 Jan 2024

EsaCL: Efficient Continual Learning of Sparse ModelsSDM (SDM), 2024

Weijieying Ren

V. Honavar

CLL

203

11 Jan 2024

Standardizing Your Training Process for Human Activity Recognition Models: A Comprehensive Review in the Tunable FactorsInternational Conference on Mobile and Ubiquitous Systems: Networking and Services (MobiQuitous), 2024

141

10 Jan 2024

Preserving Silent Features for Domain Generalization

Chujie Zhao

Tianren Zhang

Feng Chen

282

06 Jan 2024

Enhancing Generalization of Invisible Facial Privacy Cloak via Gradient AccumulationIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2024

212

03 Jan 2024

$f$-Divergence Based Classification: Beyond the Use of Cross-Entropy

f

-Divergence Based Classification: Beyond the Use of Cross-EntropyInternational Conference on Machine Learning (ICML), 2024

Nicola Novello

Andrea M. Tonello

352

02 Jan 2024

Hidden Minima in Two-Layer ReLU Networks

Yossi Arjevani

375

28 Dec 2023

Engineered Ordinary Differential Equations as Classification Algorithm (EODECA): thorough characterization and testing

360

22 Dec 2023

CR-SAM: Curvature Regularized Sharpness-Aware Minimization

Tao Wu

Tie Luo

D. C. Wunsch

243

21 Dec 2023

Enhancing Neural Training via a Correlated Dynamics Model

185

20 Dec 2023

LRS: Enhancing Adversarial Transferability through Lipschitz Regularized Surrogate

Tao Wu

Tie Luo

D. C. Wunsch

268

20 Dec 2023

Doubly Perturbed Task Free Continual Learning

Byung Hyun Lee

Min-hwan Oh

Se Young Chun

383

20 Dec 2023

Sparse is Enough in Fine-tuning Pre-trained Large Language Models

Bo Du

382

19 Dec 2023

Mixture-of-Linear-Experts for Long-term Time Series Forecasting

292

11 Dec 2023

PULSAR: Graph based Positive Unlabeled Learning with Multi Stream Adaptive Convolutions for Parkinson's Disease Recognition

199

10 Dec 2023

Cross Domain Generative Augmentation: Domain Generalization with Latent Diffusion Models

Bassel Al Omari

176

08 Dec 2023

Simplifying Neural Network Training Under Class ImbalanceNeural Information Processing Systems (NeurIPS), 2023

276

05 Dec 2023

Optimal Sample Complexity of Contrastive LearningInternational Conference on Learning Representations (ICLR), 2023

307

01 Dec 2023

Directions of Curvature as an Explanation for Loss of Plasticity

472

30 Nov 2023

Critical Influence of Overparameterization on Sharpness-aware MinimizationConference on Uncertainty in Artificial Intelligence (UAI), 2023

Sungbin Shin

Dongyeop Lee

Maksym Andriushchenko

Namhoon Lee

AAML

826

29 Nov 2023

Digital Twin-Enhanced Deep Reinforcement Learning for Resource Management in Networks SlicingIEEE Transactions on Communications (IEEE Trans. Commun.), 2023

294

28 Nov 2023

MIA-BAD: An Approach for Enhancing Membership Inference Attack and its Mitigation with Federated LearningInternational Conference on Computing, Networking and Communications (ICNC), 2023

274

28 Nov 2023

Should We Learn Most Likely Functions or Parameters?Neural Information Processing Systems (NeurIPS), 2023

Shikai Qiu

Tim G. J. Rudner

Sanyam Kapoor

Andrew Gordon Wilson

200

27 Nov 2023

Achieving Margin Maximization Exponentially Fast via Progressive Norm RescalingInternational Conference on Machine Learning (ICML), 2023

Mingze Wang

Zeping Min

Lei Wu

517

24 Nov 2023

SiGeo: Sub-One-Shot NAS via Information Theory and Geometry of Loss Landscape

308

22 Nov 2023

Spanning Training Progress: Temporal Dual-Depth Scoring (TDDS) for Enhanced Dataset PruningComputer Vision and Pattern Recognition (CVPR), 2023

377

22 Nov 2023

Innovative Horizons in Aerial Imagery: LSKNet Meets DiffusionDet for Advanced Object Detection

Ahmed Sharshar

Aleksandr Matsun

213

21 Nov 2023

Generalization Bounds for Robust Contrastive Learning: From Theory to Practice

387

16 Nov 2023

Using Stochastic Gradient Descent to Smooth Nonconvex Functions: Analysis of Implicit Graduated Optimization with Optimal Noise Scheduling

Naoki Sato

Hideaki Iiduka

402

15 Nov 2023

A PAC-Bayesian Perspective on the Interpolating Information Criterion

Liam Hodgkinson

Christopher van der Heide

Roberto Salomone

Fred Roosta

Michael W. Mahoney

278

13 Nov 2023

Cross-Silo Federated Learning Across Divergent Domains with Iterative Parameter Alignment

443

08 Nov 2023

EControl: Fast Distributed Optimization with Compression and Error ControlInternational Conference on Learning Representations (ICLR), 2023

Yuan Gao

Rustem Islamov

Sebastian U. Stich

296

06 Nov 2023