v1v2 (latest)

On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima

15 September 2016

Papers citing "On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima"

50 / 1,653 papers shown

Q-Newton: Hybrid Quantum-Classical Scheduling for Accelerating Neural Network Training with Newton's Gradient Descent

595

30 Apr 2024

Grad Queue : A probabilistic framework to reinforce sparse gradients

Irfan Mohammad Al Hasib

222

25 Apr 2024

Generalization Measures for Zero-Shot Cross-Lingual Transfer

Saksham Bassi

Duygu Ataman

Kyunghyun Cho

216

24 Apr 2024

A Hybrid Generative and Discriminative PointNet on Unordered Point Sets

Yang Ye

Shihao Ji

PINN 3DPC

247

19 Apr 2024

Singular-limit analysis of gradient descent with noise injection

Anna Shalova

André Schlichting

M. Peletier

222

18 Apr 2024

QGen: On the Ability to Generalize in Quantization Aware Training

Mohammadhossein Askarihemmat

Ahmadreza Jeddi

Reyhane Askari Hemmat

278

17 Apr 2024

Flatness Improves Backbone Generalisation in Few-shot ClassificationIEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2024

Rui Li

Martin Trapp

Talal Alrawajfeh

Arno Solin

444

11 Apr 2024

Exploring Neural Network Landscapes: Star-Shaped and Geodesic Connectivity

Zhanran Lin

Puheng Li

Lei Wu

473

09 Apr 2024

Statistical Mechanics and Artificial Neural Networks: Principles, Models, and Applications

Lucas Böttcher

Gregory R. Wheeler

326

05 Apr 2024

Rolling the dice for better deep learning performance: A study of randomness techniques in deep neural networks

Mohammed Ghaith Altarabichi

Sławomir Nowaczyk

Sepideh Pashami

Peyman Sheikholharam Mashhadi

Julia Handl

169

05 Apr 2024

Information-Theoretic Generalization Bounds for Deep Neural NetworksIEEE Transactions on Information Theory (IEEE Trans. Inf. Theory), 2024

Haiyun He

Christina Lee Yu

443

04 Apr 2024

Make Continual Learning Stronger via C-Flat

271

01 Apr 2024

Revisiting Random Weight Perturbation for Efficiently Improving Generalization

Xiaolin Huang

388

30 Mar 2024

Exploring Pathological Speech Quality Assessment with ASR-Powered Wav2Vec2 in Data-Scarce Context

156

29 Mar 2024

Model Stock: All we need is just a few fine-tuned models

416

28 Mar 2024

On the Benefits of Over-parameterization for Out-of-Distribution Generalization

Yifan Hao

Yong Lin

Difan Zou

Tong Zhang

OODD OOD

246

26 Mar 2024

Self-Supervised Multi-Frame Neural Scene Flow

407

24 Mar 2024

SM2C: Boost the Semi-supervised Segmentation for Medical Image by using Meta Pseudo Labels and Mixed Images

Yifei Wang

Chuhong Zhu

275

24 Mar 2024

Insights into the Lottery Ticket Hypothesis and Iterative Magnitude Pruning

320

22 Mar 2024

Diversity-Aware Agnostic Ensemble of Sharpness Minimizers

Tung Pham

Trung Le

264

19 Mar 2024

Friendly Sharpness-Aware MinimizationComputer Vision and Pattern Recognition (CVPR), 2024

Xiaolin Huang

274

19 Mar 2024

Semiparametric Token-Sequence Co-SupervisionAnnual Meeting of the Association for Computational Linguistics (ACL), 2024

Hyunji Lee

276

14 Mar 2024

Maxwell's Demon at Work: Efficient Pruning by Leveraging Saturation of Neurons

Pierre-Luc Bacon

365

12 Mar 2024

Unveiling the Significance of Toddler-Inspired Reward Transition in Goal-Oriented Reinforcement LearningAAAI Conference on Artificial Intelligence (AAAI), 2024

Minsu Lee

224

11 Mar 2024

Transformers Learn Low Sensitivity Functions: Investigations and ImplicationsInternational Conference on Learning Representations (ICLR), 2024

470

11 Mar 2024

CarbonNet: How Computer Vision Plays a Role in Climate Change? Application: Learning Geomechanics from Subsurface Geometry of CCS to Mitigate Global WarmingJournal of Robotics and Automation Research (JRAR), 2024

223

09 Mar 2024

Tune without Validation: Searching for Learning Rate and Weight Decay on Training Sets

Lorenzo Brigato

Stavroula Mougiakakou

213

08 Mar 2024

GRAWA: Gradient-based Weighted Averaging for Distributed Training of Deep Learning Models

Tolga Dimlioglu

A. Choromańska

240

07 Mar 2024

Non-Convex Stochastic Composite Optimization with Polyak Momentum

Yuan Gao

Anton Rodomanov

Sebastian U. Stich

304

05 Mar 2024

Level Set Teleportation: An Optimization Perspective

Aaron Mishkin

A. Bietti

Robert Mansel Gower

313

05 Mar 2024

A Survey on Evaluation of Out-of-Distribution Generalization

Peng Cui

323

04 Mar 2024

Merging Text Transformer Models from Different Initializations

Neha Verma

Maha Elbayad

MoMe

371

01 Mar 2024

Beyond Single-Model Views for Deep Learning: Optimization versus Generalizability of Stochastic Optimization Algorithms

Toki Tahmid Inan

Mingrui Liu

Amarda Shehu

223

01 Mar 2024

Flatten Long-Range Loss Landscapes for Cross-Domain Few-Shot Learning

Yixiong Zou

235

01 Mar 2024

Fine-tuning with Very Large Dropout

Jianyu Zhang

Léon Bottou

400

01 Mar 2024

Batch size invariant Adam

Xi Wang

Laurence Aitchison

261

29 Feb 2024

Gradient Alignment for Cross-Domain Face Anti-Spoofing

B. Le

Simon S. Woo

CVBM

402

29 Feb 2024

Pre-training Differentially Private Models with Limited Public Data

Zhiqi Bu

Xinwei Zhang

Mingyi Hong

Sheng Zha

George Karypis

302

28 Feb 2024

Unveiling Privacy, Memorization, and Input Curvature Links

297

28 Feb 2024

Learning to Deliver: a Foundation Model for the Montreal Capacitated Vehicle Routing Problem

Samuel J. K. Chin

Matthias Winkenbach

Akash Srivastava

190

28 Feb 2024

Layer-wise Regularized Dropout for Neural Language Models

Shiwen Ni

Min Yang

Ruifeng Xu

Chengming Li

Xiping Hu

126

26 Feb 2024

Effective Gradient Sample Size via Variation Estimation for Accelerating Sharpness aware Minimization

212

24 Feb 2024

Helen: Optimizing CTR Prediction Models with Frequency-wise Hessian Eigenvalue Regularization

Zirui Zhu

Yong Liu

Zangwei Zheng

Huifeng Guo

Yang You

149

23 Feb 2024

On the Duality Between Sharpness-Aware Minimization and Adversarial Training

Huanran Chen

Zeming Wei

390

23 Feb 2024

NeuroFlux: Memory-Efficient CNN Training Using Adaptive Local Learning

Dhananjay Saikumar

Blesson Varghese

238

21 Feb 2024

Investigating the Histogram Loss in Regression

Ehsan Imani

Kai Luedemann

Sam Scholnick-Hughes

Esraa Elelimy

Martha White

UQCV

165

20 Feb 2024

Scaling physics-informed hard constraints with mixture-of-experts

231

20 Feb 2024

OptEx: Expediting First-Order Optimization with Approximately Parallelized Iterations

Yao Shu

Jiongfeng Fang

Y. He

Fei Richard Yu

165

18 Feb 2024

AdAdaGrad: Adaptive Batch Size Schemes for Adaptive Gradient Methods

402

17 Feb 2024

SAMformer: Unlocking the Potential of Transformers in Time Series Forecasting with Sharpness-Aware Minimization and Channel-Wise Attention

311

15 Feb 2024