Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
1609.04836
Cited By
v1
v2 (latest)
On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima
15 September 2016
N. Keskar
Dheevatsa Mudigere
J. Nocedal
M. Smelyanskiy
P. T. P. Tang
ODL
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima"
50 / 1,653 papers shown
Title
Communication-Efficient Device Scheduling for Federated Learning Using Lyapunov Optimization
Jake B. Perazzone
Maroun Touma
Mingyue Ji
Kevin S. Chan
FedML
377
1
0
01 Mar 2025
LORENZA: Enhancing Generalization in Low-Rank Gradient LLM Training via Efficient Zeroth-Order Adaptive SAM
Yehonathan Refael
Iftach Arbel
Ofir Lindenbaum
Tom Tirer
405
2
0
26 Feb 2025
SASSHA: Sharpness-aware Adaptive Second-order Optimization with Stable Hessian Approximation
Dahun Shin
Dongyeop Lee
Jinseok Chung
Namhoon Lee
ODL
AAML
1.2K
2
0
25 Feb 2025
Reasoning Bias of Next Token Prediction Training
Pengxiao Lin
Zhongwang Zhang
Zhi-Qin John Xu
LRM
440
2
0
21 Feb 2025
On Memorization in Diffusion Models
Xiangming Gu
Chao Du
Tianyu Pang
Chongxuan Li
Min Lin
Ye Wang
DiffM
TDI
611
93
0
21 Feb 2025
Generalization Guarantees for Representation Learning via Data-Dependent Gaussian Mixture Priors
International Conference on Learning Representations (ICLR), 2025
Romain Chor
Milad Sefidgaran
Piotr Krasnowski
466
3
0
21 Feb 2025
Unveiling Mode Connectivity in Graph Neural Networks
Bingheng Li
Z. Chen
Haoyu Han
Shenglai Zeng
J. Liu
Shucheng Zhou
236
1
0
18 Feb 2025
Improving the Stability of GNN Force Field Models by Reducing Feature Correlation
Y. Zeng
Wenlong He
Ihor Vasyltsov
Jiaxin Wei
Ying Zhang
Lin Chen
Yuehua Dai
190
0
0
18 Feb 2025
Computational Safety for Generative AI: A Signal Processing Perspective
Pin-Yu Chen
268
2
0
18 Feb 2025
UniGuardian: A Unified Defense for Detecting Prompt Injection, Backdoor Attacks and Adversarial Attacks in Large Language Models
Huawei Lin
Yingjie Lao
Tong Geng
Tan Yu
Weijie Zhao
AAML
SILM
455
7
0
18 Feb 2025
Universal Sharpness Dynamics in Neural Network Training: Fixed Point Analysis, Edge of Stability, and Route to Chaos
Dayal Singh Kalra
Tianyu He
M. Barkeshli
379
11
0
17 Feb 2025
3DMolFormer: A Dual-channel Framework for Structure-based Drug Discovery
International Conference on Learning Representations (ICLR), 2025
Xiuyuan Hu
Guoqing Liu
Can Chen
Yang Zhao
Ning Yang
Xue Liu
258
5
0
07 Feb 2025
Evidence on the Regularisation Properties of Maximum-Entropy Reinforcement Learning
Rémy Hosseinkhan Boucher
Onofrio Semeraro
L. Mathelin
274
0
0
28 Jan 2025
Evolutionary Optimization of Model Merging Recipes
Takuya Akiba
Makoto Shing
Yujin Tang
Qi Sun
David Ha
MoMe
620
168
0
28 Jan 2025
On the use of neural networks for the structural characterization of polymeric porous materials
Jorge Torre
Suset Barroso-Solares
M.A. Rodríguez-Pérez
Javier Pinto
233
7
0
25 Jan 2025
Explicit Eigenvalue Regularization Improves Sharpness-Aware Minimization
Haocheng Luo
Tuan Truong
Tung Pham
Mehrtash Harandi
Dinh Q. Phung
Trung Le
228
11
0
22 Jan 2025
Neural networks for insurance pricing with frequency and severity data: a benchmark study from data preprocessing to technical tariff
North American Actuarial Journal (NAAJ), 2023
Freek Holvoet
Katrien Antonio
Roel Henckaerts
380
7
0
20 Jan 2025
Gradient Descent Converges Linearly to Flatter Minima than Gradient Flow in Shallow Linear Networks
Pierfrancesco Beneventano
Blake Woodworth
MLT
398
2
0
15 Jan 2025
Preconditioned Sharpness-Aware Minimization: Unifying Analysis and a Novel Learning Algorithm
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2025
Yilang Zhang
Bingcong Li
G. Giannakis
AAML
195
0
0
11 Jan 2025
Time Transfer: On Optimal Learning Rate and Batch Size In The Infinite Data Limit
Oleg Filatov
Jan Ebert
Jiangtao Wang
Stefan Kesselheim
326
8
0
10 Jan 2025
Towards Unraveling and Improving Generalization in World Models
Qiaoyi Fang
Weiyu Du
Hang Wang
Junshan Zhang
OOD
227
1
0
03 Jan 2025
Adaptive Batch Size Schedules for Distributed Training of Language Models with Data and Model Parallelism
Tim Tsz-Kit Lau
Weijian Li
Chenwei Xu
Han Liu
Mladen Kolar
1.2K
0
0
30 Dec 2024
Can Stability be Detrimental? Better Generalization through Gradient Descent Instabilities
Lawrence Wang
Stephen J. Roberts
257
0
0
23 Dec 2024
Sharpness-Aware Minimization with Adaptive Regularization for Training Deep Neural Networks
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2024
Jinping Zou
Xiaoge Deng
Tao Sun
320
1
0
22 Dec 2024
SSE-SAM: Balancing Head and Tail Classes Gradually through Stage-Wise SAM
AAAI Conference on Artificial Intelligence (AAAI), 2024
Xingyu Lyu
Qianqian Xu
Zhiyong Yang
Shaojie Lyu
Qingming Huang
460
1
0
18 Dec 2024
LLaVA-UHD v2: an MLLM Integrating High-Resolution Semantic Pyramid via Hierarchical Window Transformer
Yipeng Zhang
Yi Liu
Zonghao Guo
Yidan Zhang
Xuesong Yang
...
Xingtai Lv
Zhiyuan Liu
Tat-Seng Chua
Maosong Sun
Maosong Sun
MLLM
VLM
343
3
0
18 Dec 2024
Optical aberrations in autonomous driving: Physics-informed parameterized temperature scaling for neural network uncertainty calibration
D. Wolf
Alexander Braun
Markus Ulrich
501
0
0
18 Dec 2024
Seeking Consistent Flat Minima for Better Domain Generalization via Refining Loss Landscapes
Computer Vision and Pattern Recognition (CVPR), 2024
Aodi Li
Liansheng Zhuang
Xiao Long
Minghong Yao
Shafei Wang
1.1K
6
0
18 Dec 2024
Unveiling the Secret Recipe: A Guide For Supervised Fine-Tuning Small LLMs
International Conference on Learning Representations (ICLR), 2024
Aldo Pareja
Nikhil Shivakumar Nayak
Hao Wang
Krishnateja Killamsetty
Shivchander Sudalairaj
...
Guangxuan Xu
Kai Xu
Ligong Han
Luke Inglis
Akash Srivastava
434
29
0
17 Dec 2024
LossLens: Diagnostics for Machine Learning through Loss Landscape Visual Analytics
IEEE Computer Graphics and Applications (IEEE CG&A), 2024
Tiankai Xie
Jiaqing Chen
Yaoqing Yang
Caleb Geniesse
Ge Shi
...
J. Cava
Michael W. Mahoney
Talita Perciano
Gunther H. Weber
Ross Maciejewski
247
1
0
17 Dec 2024
A Method for Enhancing Generalization of Adam by Multiple Integrations
AAAI Conference on Artificial Intelligence (AAAI), 2024
Long Jin
Han Nong
Liangming Chen
Zhenming Su
245
0
0
17 Dec 2024
Meta Curvature-Aware Minimization for Domain Generalization
Zhaoyu Chen
Yiwen Ye
Feilong Tang
Yongsheng Pan
Yong-quan Xia
BDL
947
1
0
16 Dec 2024
OccScene: Semantic Occupancy-based Cross-task Mutual Learning for 3D Scene Generation
IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2024
Bohan Li
Jianfeng Dong
Jiadong Wang
Yukai Shi
Yasheng Sun
...
Zhuang Ma
Baao Xie
Chao Ma
Yunbo Wang
Wenjun Zeng
DiffM
856
4
0
15 Dec 2024
Towards Understanding the Role of Sharpness-Aware Minimization Algorithms for Out-of-Distribution Generalization
Samuel Schapiro
Han Zhao
357
1
0
06 Dec 2024
Exponential Moving Average of Weights in Deep Learning: Dynamics and Benefits
Daniel Morales-Brotons
Thijs Vogels
Aymeric Dieuleveut
364
62
0
27 Nov 2024
An In-depth Investigation of Sparse Rate Reduction in Transformer-like Models
Neural Information Processing Systems (NeurIPS), 2024
Yunzhe Hu
Difan Zou
Dong Xu
375
3
0
26 Nov 2024
AI-Spectra: A Visual Dashboard for Model Multiplicity to Enhance Informed and Transparent Decision-Making
Engineering Interactive Computing System (EICS), 2024
Gilles Eerlings
Sebe Vanbrabant
Jori Liesenborgs
Gustavo Rovelo Ruiz
Davy Vanacken
Kris Luyten
231
5
0
14 Nov 2024
Enhancing generalization in high energy physics using white-box adversarial attacks
Franck Rothen
Samuel Klein
Matthew Leigh
J. A. Raine
AAML
299
1
0
14 Nov 2024
LA4SR: illuminating the dark proteome with generative AI
David R. Nelson
Ashish Kumar Jaiswal
Noha Ismail
Alexandra Mystikou
Kourosh Salehi-Ashtiani
165
0
0
11 Nov 2024
Photon: Federated LLM Pre-Training
Lorenzo Sani
Alex Iacob
Zeyu Cao
Royson Lee
Bill Marino
...
Dongqi Cai
Zexi Li
Wanru Zhao
Xinchi Qiu
Nicholas D. Lane
AI4CE
308
15
0
05 Nov 2024
R+R:Understanding Hyperparameter Effects in DP-SGD
Asia-Pacific Computer Systems Architecture Conference (ACSA), 2024
Felix Morsbach
J. Reubold
T. Strufe
207
1
0
04 Nov 2024
Theoretical characterisation of the Gauss-Newton conditioning in Neural Networks
Neural Information Processing Systems (NeurIPS), 2024
Jim Zhao
Sidak Pal Singh
Aurelien Lucchi
AI4CE
455
3
0
04 Nov 2024
1st-Order Magic: Analysis of Sharpness-Aware Minimization
Nalin Tiwary
Siddarth Aananth
138
0
0
03 Nov 2024
Simplicity Bias via Global Convergence of Sharpness Minimization
International Conference on Machine Learning (ICML), 2024
Khashayar Gatmiry
Zhiyuan Li
Sashank J. Reddi
Stefanie Jegelka
250
2
0
21 Oct 2024
Implicit Regularization of Sharpness-Aware Minimization for Scale-Invariant Problems
Neural Information Processing Systems (NeurIPS), 2024
Bingcong Li
Liang Zhang
Niao He
272
9
0
18 Oct 2024
MomentumSMoE: Integrating Momentum into Sparse Mixture of Experts
Neural Information Processing Systems (NeurIPS), 2024
R. Teo
Tan M. Nguyen
MoE
207
6
0
18 Oct 2024
Transformer-Based Approaches for Sensor-Based Human Activity Recognition: Opportunities and Challenges
Clayton Frederick Souza Leite
Henry Mauranen
Aziza Zhanabatyrova
Yu Xiao
230
8
0
17 Oct 2024
Deep Model Merging: The Sister of Neural Network Interpretability -- A Survey
A. Khan
Todd Nief
Nathaniel Hudson
Mansi Sakarvadia
Daniel Grzenda
Aswathy Ajith
Jordan Pettyjohn
Kyle Chard
Ian Foster
MoMe
155
1
0
16 Oct 2024
From promise to practice: realizing high-performance decentralized training
International Conference on Learning Representations (ICLR), 2024
Zesen Wang
Jiaojiao Zhang
Xuyang Wu
M. Johansson
298
2
0
15 Oct 2024
Combinatorial Multi-armed Bandits: Arm Selection via Group Testing
Arpan Mukherjee
Shashanka Ubaru
K. Murugesan
Karthikeyan Shanmugam
A. Tajer
258
5
0
14 Oct 2024
Previous
1
2
3
4
5
...
32
33
34
Next