Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
1609.04836
Cited By
v1
v2 (latest)
On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima
15 September 2016
N. Keskar
Dheevatsa Mudigere
J. Nocedal
M. Smelyanskiy
P. T. P. Tang
ODL
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima"
50 / 1,652 papers shown
Title
Frequency-Adaptive Sharpness Regularization for Improving 3D Gaussian Splatting Generalization
Youngsik Yun
Dongjun Gu
Youngjung Uh
108
0
0
22 Nov 2025
A Unified Stability Analysis of SAM vs SGD: Role of Data Coherence and Emergence of Simplicity Bias
Wei-Kai Chang
Rajiv Khanna
MLT
128
0
0
21 Nov 2025
Fast LLM Post-training via Decoupled and Best-of-N Speculation
Rongxin Cheng
Kai Zhou
Xingda Wei
Siyuan Liu
Mingcong Han
...
Yeju Zhou
Baoquan Zhong
W. L. Xiao
Rong Chen
Haibo Chen
OffRL
LRM
330
0
0
20 Nov 2025
Forecasting Thermospheric Density with Transformers for Multi-Satellite Orbit Management
Cedric Bös
Alessandro Bortotto
Mohamed Khalil Ben-Larbi
28
0
0
08 Nov 2025
Sharp Minima Can Generalize: A Loss Landscape Perspective On Data
Raymond Fan
Bryce Sandlund
Lin Myat Ko
80
0
0
06 Nov 2025
Linear Mode Connectivity under Data Shifts for Deep Ensembles of Image Classifiers
C. Hepburn
T. Zielke
A.P. Raulf
143
0
0
06 Nov 2025
Flat Minima and Generalization: Insights from Stochastic Convex Optimization
Matan Schliserman
Shira Vansover-Hager
Tomer Koren
72
0
0
05 Nov 2025
Bayesian Natural Gradient Fine-Tuning of CLIP Models via Kalman Filtering
Hossein Abdi
Mingfei Sun
Wei Pan
VLM
197
0
0
03 Nov 2025
The Curvature Rate λ: A Scalar Measure of Input-Space Sharpness in Neural Networks
Jacob Poschl
90
0
0
03 Nov 2025
A Framework Based on Graph Cellular Automata for Similarity Evaluation in Urban Spatial Networks
Peiru Wu
Maojun Zhai
Lingzhu Zhang
84
0
0
02 Nov 2025
DP-FedPGN: Finding Global Flat Minima for Differentially Private Federated Learning via Penalizing Gradient Norm
Junkang Liu
Yuxuan Tian
Fanhua Shang
Yuanyuan Liu
Hongying Liu
Junchao Zhou
Daorui Ding
FedML
229
2
0
31 Oct 2025
Information-Theoretic Greedy Layer-wise Training for Traffic Sign Recognition
Shuyan Lyu
Zhanzimo Wu
Junliang Du
76
0
0
31 Oct 2025
Gradient Descent as Loss Landscape Navigation: a Normative Framework for Deriving Learning Rules
John J. Vastola
Samuel J. Gershman
K. Rajan
93
1
0
30 Oct 2025
A Convexity-dependent Two-Phase Training Algorithm for Deep Neural Networks
T. Hrycej
Bernhard Bermeitinger
Massimo Pavone
Götz-Henrik Wiegand
Siegfried Handschuh
48
0
0
29 Oct 2025
Gradient-Weight Alignment as a Train-Time Proxy for Generalization in Classification Tasks
Florian A. Hölzl
Daniel Rueckert
Georgios Kaissis
77
0
0
29 Oct 2025
From Memorization to Reasoning in the Spectrum of Loss Curvature
Jack Merullo
Srihita Vatsavaya
Lucius Bushnaq
Owen Lewis
170
0
0
28 Oct 2025
Modality-Aware SAM: Sharpness-Aware-Minimization Driven Gradient Modulation for Harmonized Multimodal Learning
Hossein R. Nowdeh
Jie Ji
Xiaolong Ma
Fatemeh Afghah
92
0
0
28 Oct 2025
More Than Memory Savings: Zeroth-Order Optimization Mitigates Forgetting in Continual Learning
Wanhao Yu
Zheng Wang
Shuteng Niu
Sen Lin
Li Yang
CLL
220
0
0
23 Oct 2025
Convergence Analysis of SGD under Expected Smoothness
Yuta Kawamoto
Hideaki Iiduka
144
0
0
23 Oct 2025
Position: Many generalization measures for deep learning are fragile
Shuofeng Zhang
A. Louis
AAML
210
0
0
21 Oct 2025
A Unified Perspective on Optimization in Machine Learning and Neuroscience: From Gradient Descent to Neural Adaptation
Jesus Garcia Fernandez
Nasir Ahmad
Marcel van Gerven
AI4CE
225
0
0
21 Oct 2025
Stochastic Difference-of-Convex Optimization with Momentum
El Mahdi Chayti
Martin Jaggi
84
0
0
20 Oct 2025
Seeing Through the Brain: New Insights from Decoding Visual Stimuli with fMRI
Zheng Huang
Enpei Zhang
Yinghao Cai
Weikang Qiu
Carl Yang
Elynn Chen
Xiang Zhang
Rex Ying
Dawei Zhou
Yujun Yan
DiffM
88
0
0
17 Oct 2025
When Flatness Does (Not) Guarantee Adversarial Robustness
Nils Philipp Walter
Linara Adilova
Jilles Vreeken
Michael Kamp
84
1
0
16 Oct 2025
DYNAMIX: RL-based Adaptive Batch Size Optimization in Distributed Machine Learning Systems
Yuanjun Dai
Keqiang He
An Wang
76
0
0
09 Oct 2025
AppForge: From Assistant to Independent Developer - Are GPTs Ready for Software Development?
Dezhi Ran
Yuan Cao
Mengzhou Wu
Simin Chen
Yuzhe Guo
...
Jialei Wei
Linyi Li
Wei Yang
Baishakhi Ray
Tao Xie
LLMAG
ALM
ELM
88
0
0
09 Oct 2025
LinearSR: Unlocking Linear Attention for Stable and Efficient Image Super-Resolution
Xiaohui Li
Shaobin Zhuang
Shuo Cao
Yang Yang
Yuandong Pu
Qi Qin
Siqi Luo
Bin Fu
Yihao Liu
DiffM
140
0
0
09 Oct 2025
Unveiling the Power of Multiple Gossip Steps: A Stability-Based Generalization Analysis in Decentralized Training
Qinglun Li
Yingqi Liu
Miao Zhang
Xiaochun Cao
Quanjun Yin
Li Shen
84
0
0
09 Oct 2025
Adjusting Initial Noise to Mitigate Memorization in Text-to-Image Diffusion Models
Hyeonggeun Han
Sehwan Kim
Hyungjun Joo
Sangwoo Hong
Jungwoo Lee
DiffM
153
1
0
08 Oct 2025
The Effect of Label Noise on the Information Content of Neural Representations
Ali Hussaini Umar
Franky Kevin Nando Tezoh
Jean Barbier
Santiago Acevedo
Alessandro Laio
SSL
NoLa
186
0
0
07 Oct 2025
How does the optimizer implicitly bias the model merging loss landscape?
Chenxiang Zhang
Alexander Theus
Damien Teney
Antonio Orvieto
Jun Pang
S. Mauw
MoMe
162
0
0
06 Oct 2025
Categorical Invariants of Learning Dynamics
Abdulrahman Tamim
OOD
101
0
0
05 Oct 2025
Adaptively Sampling-Reusing-Mixing Decomposed Gradients to Speed Up Sharpness Aware Minimization
Jiaxin Deng
Junbiao Pang
128
0
0
04 Oct 2025
Optimal Scaling Needs Optimal Norm
Oleg Filatov
Jiangtao Wang
J. Ebert
Stefan Kesselheim
132
1
0
04 Oct 2025
Flatness-Aware Stochastic Gradient Langevin Dynamics
Stefano Bruno
Youngsik Hwang
Jaehyeon An
Sotirios Sabanis
Dong-Young Lim
140
0
0
02 Oct 2025
How Does Preconditioning Guide Feature Learning in Deep Neural Networks?
Kotaro Yoshida
Atsushi Nitanda
162
0
0
30 Sep 2025
Hybrid Dual-Batch and Cyclic Progressive Learning for Efficient Distributed Training
Kuan-Wei Lu
Ding-Yong Hong
Pangfeng Liu
Jan-Jan Wu
94
0
0
30 Sep 2025
Sharpness of Minima in Deep Matrix Factorization: Exact Expressions
Anil Kamber
Rahul Parhi
FAtt
299
0
0
30 Sep 2025
Reconcile Certified Robustness and Accuracy for DNN-based Smoothed Majority Vote Classifier
Gaojie Jin
Xinping Yi
Xiaowei Huang
AAML
93
1
0
30 Sep 2025
Beyond Magic Words: Sharpness-Aware Prompt Evolving for Robust Large Language Models with TARE
Guancheng Wan
Lucheng Fu
Haoxin Liu
Yiqiao Jin
Hui Yi Leong
...
Yunpu Ma
Xiangru Tang
B. A. Prakash
Yizhou Sun
Wei Wang
KELM
75
0
0
28 Sep 2025
Dynamics of Learning: Generative Schedules from Latent ODEs
Matt L. Sampson
Peter Melchior
80
0
0
27 Sep 2025
Fine-tuning Done Right in Model Editing
Wanli Yang
Fei Sun
Rui Tang
Hongyu Zang
Du Su
Qi Cao
Jingang Wang
Huawei Shen
Xueqi Cheng
KELM
132
0
0
26 Sep 2025
Sharpness-Aware Minimization Can Hallucinate Minimizers
Chanwoong Park
Uijeong Jang
Ernest K. Ryu
Insoon Yang
95
0
0
26 Sep 2025
TRUST: Test-Time Refinement using Uncertainty-Guided SSM Traverses
Sahar Dastani
Ali Bahri
G. A. V. Hakim
Moslem Yazdanpanah
Mehrdad Noori
David Osowiechi
Samuel Barbeau
Ismail Ben Ayed
H. Lombaert
Christian Desrosiers
TTA
183
0
0
26 Sep 2025
A Unified Noise-Curvature View of Loss of Trainability
Gunbir Singh Baveja
Alex Lewandowski
Mark Schmidt
133
0
0
24 Sep 2025
Flatness is Necessary, Neural Collapse is Not: Rethinking Generalization via Grokking
T. Han
Linara Adilova
Henning Petzka
Jens Kleesiek
Michael Kamp
201
1
0
22 Sep 2025
Neural Network Based Framework for Passive Intermodulation Cancellation in MIMO Systems
Xiaolong Li
Z. Xu
Peiting You
Yifei Zhu
120
0
0
21 Sep 2025
DIVEBATCH: Accelerating Model Training Through Gradient-Diversity Aware Batch Size Adaptation
Yuen Chen
Yian Wang
Hari Sundaram
84
0
0
19 Sep 2025
MEC-Quant: Maximum Entropy Coding for Extremely Low Bit Quantization-Aware Training
Junbiao Pang
Tianyang Cai
Baochang Zhang
MQ
108
0
0
19 Sep 2025
Pre-training under infinite compute
Konwoo Kim
Suhas Kotha
Abigail Z. Jacobs
Tatsunori Hashimoto
188
1
0
18 Sep 2025
1
2
3
4
...
32
33
34
Next