ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1609.04836
  4. Cited By
On Large-Batch Training for Deep Learning: Generalization Gap and Sharp
  Minima
v1v2 (latest)

On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima

15 September 2016
N. Keskar
Dheevatsa Mudigere
J. Nocedal
M. Smelyanskiy
P. T. P. Tang
    ODL
ArXiv (abs)PDFHTML

Papers citing "On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima"

50 / 1,653 papers shown
Fast LLM Post-training via Decoupled and Fastest-of-N Speculation
Fast LLM Post-training via Decoupled and Fastest-of-N Speculation
Rongxin Cheng
Kai Zhou
Xingda Wei
Siyuan Liu
Mingcong Han
...
Yeju Zhou
Baoquan Zhong
W. L. Xiao
Rong Chen
Haibo Chen
OffRLLRM
436
0
0
24 Dec 2025
Data Curation Through the Lens of Spectral Dynamics: Static Limits, Dynamic Acceleration, and Practical Oracles
Data Curation Through the Lens of Spectral Dynamics: Static Limits, Dynamic Acceleration, and Practical Oracles
Yizhou Zhang
Lun Du
132
0
0
02 Dec 2025
Frequency-Adaptive Sharpness Regularization for Improving 3D Gaussian Splatting Generalization
Frequency-Adaptive Sharpness Regularization for Improving 3D Gaussian Splatting Generalization
Youngsik Yun
Dongjun Gu
Youngjung Uh
144
0
0
22 Nov 2025
A Unified Stability Analysis of SAM vs SGD: Role of Data Coherence and Emergence of Simplicity Bias
A Unified Stability Analysis of SAM vs SGD: Role of Data Coherence and Emergence of Simplicity Bias
Wei-Kai Chang
Rajiv Khanna
MLT
196
0
0
21 Nov 2025
Forecasting Thermospheric Density with Transformers for Multi-Satellite Orbit Management
Forecasting Thermospheric Density with Transformers for Multi-Satellite Orbit Management
Cedric Bös
Alessandro Bortotto
Mohamed Khalil Ben-Larbi
45
0
0
08 Nov 2025
Linear Mode Connectivity under Data Shifts for Deep Ensembles of Image Classifiers
Linear Mode Connectivity under Data Shifts for Deep Ensembles of Image Classifiers
C. Hepburn
T. Zielke
A.P. Raulf
175
0
0
06 Nov 2025
Sharp Minima Can Generalize: A Loss Landscape Perspective On Data
Sharp Minima Can Generalize: A Loss Landscape Perspective On Data
Raymond Fan
Bryce Sandlund
Lin Myat Ko
126
0
0
06 Nov 2025
Flat Minima and Generalization: Insights from Stochastic Convex Optimization
Flat Minima and Generalization: Insights from Stochastic Convex Optimization
Matan Schliserman
Shira Vansover-Hager
Tomer Koren
112
0
0
05 Nov 2025
The Curvature Rate λ: A Scalar Measure of Input-Space Sharpness in Neural Networks
The Curvature Rate λ: A Scalar Measure of Input-Space Sharpness in Neural Networks
Jacob Poschl
173
0
0
03 Nov 2025
Bayesian Natural Gradient Fine-Tuning of CLIP Models via Kalman Filtering
Bayesian Natural Gradient Fine-Tuning of CLIP Models via Kalman Filtering
Hossein Abdi
Mingfei Sun
Wei Pan
VLM
235
0
0
03 Nov 2025
A Framework Based on Graph Cellular Automata for Similarity Evaluation in Urban Spatial Networks
A Framework Based on Graph Cellular Automata for Similarity Evaluation in Urban Spatial Networks
Peiru Wu
Maojun Zhai
Lingzhu Zhang
124
0
0
02 Nov 2025
Information-Theoretic Greedy Layer-wise Training for Traffic Sign Recognition
Information-Theoretic Greedy Layer-wise Training for Traffic Sign Recognition
Shuyan Lyu
Zhanzimo Wu
Junliang Du
114
0
0
31 Oct 2025
DP-FedPGN: Finding Global Flat Minima for Differentially Private Federated Learning via Penalizing Gradient Norm
DP-FedPGN: Finding Global Flat Minima for Differentially Private Federated Learning via Penalizing Gradient Norm
Junkang Liu
Yuxuan Tian
Fanhua Shang
Yuanyuan Liu
Hongying Liu
Junchao Zhou
Daorui Ding
FedML
275
2
0
31 Oct 2025
Gradient Descent as Loss Landscape Navigation: a Normative Framework for Deriving Learning Rules
Gradient Descent as Loss Landscape Navigation: a Normative Framework for Deriving Learning Rules
John J. Vastola
Samuel J. Gershman
K. Rajan
124
1
0
30 Oct 2025
Gradient-Weight Alignment as a Train-Time Proxy for Generalization in Classification Tasks
Gradient-Weight Alignment as a Train-Time Proxy for Generalization in Classification Tasks
Florian A. Hölzl
Daniel Rueckert
Georgios Kaissis
142
0
0
29 Oct 2025
A Convexity-dependent Two-Phase Training Algorithm for Deep Neural Networks
A Convexity-dependent Two-Phase Training Algorithm for Deep Neural Networks
T. Hrycej
Bernhard Bermeitinger
Massimo Pavone
Götz-Henrik Wiegand
Siegfried Handschuh
71
0
0
29 Oct 2025
From Memorization to Reasoning in the Spectrum of Loss Curvature
From Memorization to Reasoning in the Spectrum of Loss Curvature
Jack Merullo
Srihita Vatsavaya
Lucius Bushnaq
Owen Lewis
210
1
0
28 Oct 2025
Modality-Aware SAM: Sharpness-Aware-Minimization Driven Gradient Modulation for Harmonized Multimodal Learning
Modality-Aware SAM: Sharpness-Aware-Minimization Driven Gradient Modulation for Harmonized Multimodal Learning
Hossein R. Nowdeh
Jie Ji
Xiaolong Ma
Fatemeh Afghah
139
0
0
28 Oct 2025
More Than Memory Savings: Zeroth-Order Optimization Mitigates Forgetting in Continual Learning
More Than Memory Savings: Zeroth-Order Optimization Mitigates Forgetting in Continual Learning
Wanhao Yu
Zheng Wang
Shuteng Niu
Sen Lin
Li Yang
CLL
243
0
0
23 Oct 2025
Convergence Analysis of SGD under Expected Smoothness
Convergence Analysis of SGD under Expected Smoothness
Yuta Kawamoto
Hideaki Iiduka
148
0
0
23 Oct 2025
Position: Many generalization measures for deep learning are fragile
Position: Many generalization measures for deep learning are fragile
Shuofeng Zhang
A. Louis
AAML
279
0
0
21 Oct 2025
A Unified Perspective on Optimization in Machine Learning and Neuroscience: From Gradient Descent to Neural Adaptation
A Unified Perspective on Optimization in Machine Learning and Neuroscience: From Gradient Descent to Neural Adaptation
Jesus Garcia Fernandez
Nasir Ahmad
Marcel van Gerven
AI4CE
259
0
0
21 Oct 2025
Stochastic Difference-of-Convex Optimization with Momentum
Stochastic Difference-of-Convex Optimization with Momentum
El Mahdi Chayti
Martin Jaggi
119
0
0
20 Oct 2025
Seeing Through the Brain: New Insights from Decoding Visual Stimuli with fMRI
Seeing Through the Brain: New Insights from Decoding Visual Stimuli with fMRI
Zheng Huang
Enpei Zhang
Yinghao Cai
Weikang Qiu
Carl Yang
Elynn Chen
Xiang Zhang
Rex Ying
Dawei Zhou
Yujun Yan
DiffM
128
0
0
17 Oct 2025
When Flatness Does (Not) Guarantee Adversarial Robustness
When Flatness Does (Not) Guarantee Adversarial Robustness
Nils Philipp Walter
Linara Adilova
Jilles Vreeken
Michael Kamp
134
1
0
16 Oct 2025
DYNAMIX: RL-based Adaptive Batch Size Optimization in Distributed Machine Learning Systems
DYNAMIX: RL-based Adaptive Batch Size Optimization in Distributed Machine Learning Systems
Yuanjun Dai
Keqiang He
An Wang
123
0
0
09 Oct 2025
Unveiling the Power of Multiple Gossip Steps: A Stability-Based Generalization Analysis in Decentralized Training
Unveiling the Power of Multiple Gossip Steps: A Stability-Based Generalization Analysis in Decentralized Training
Qinglun Li
Yingqi Liu
Miao Zhang
Xiaochun Cao
Quanjun Yin
Li Shen
114
0
0
09 Oct 2025
LinearSR: Unlocking Linear Attention for Stable and Efficient Image Super-Resolution
LinearSR: Unlocking Linear Attention for Stable and Efficient Image Super-Resolution
Xiaohui Li
Shaobin Zhuang
Shuo Cao
Yang Yang
Yuandong Pu
Qi Qin
Siqi Luo
Bin Fu
Yihao Liu
DiffM
199
0
0
09 Oct 2025
AppForge: From Assistant to Independent Developer - Are GPTs Ready for Software Development?
AppForge: From Assistant to Independent Developer - Are GPTs Ready for Software Development?
Dezhi Ran
Yuan Cao
Mengzhou Wu
Simin Chen
Yuzhe Guo
...
Jialei Wei
Linyi Li
Wei Yang
Baishakhi Ray
Tao Xie
LLMAGALMELM
116
0
0
09 Oct 2025
Adjusting Initial Noise to Mitigate Memorization in Text-to-Image Diffusion Models
Adjusting Initial Noise to Mitigate Memorization in Text-to-Image Diffusion Models
Hyeonggeun Han
Sehwan Kim
Hyungjun Joo
Sangwoo Hong
Jungwoo Lee
DiffM
186
1
0
08 Oct 2025
The Effect of Label Noise on the Information Content of Neural Representations
The Effect of Label Noise on the Information Content of Neural Representations
Ali Hussaini Umar
Franky Kevin Nando Tezoh
Jean Barbier
Santiago Acevedo
Alessandro Laio
SSLNoLa
230
0
0
07 Oct 2025
How does the optimizer implicitly bias the model merging loss landscape?
How does the optimizer implicitly bias the model merging loss landscape?
Chenxiang Zhang
Alexander Theus
Damien Teney
Antonio Orvieto
Jun Pang
S. Mauw
MoMe
189
1
0
06 Oct 2025
Categorical Invariants of Learning Dynamics
Categorical Invariants of Learning Dynamics
Abdulrahman Tamim
OOD
116
0
0
05 Oct 2025
Adaptively Sampling-Reusing-Mixing Decomposed Gradients to Speed Up Sharpness Aware Minimization
Adaptively Sampling-Reusing-Mixing Decomposed Gradients to Speed Up Sharpness Aware Minimization
Jiaxin Deng
Junbiao Pang
164
0
0
04 Oct 2025
Optimal Scaling Needs Optimal Norm
Optimal Scaling Needs Optimal Norm
Oleg Filatov
Jiangtao Wang
J. Ebert
Stefan Kesselheim
166
2
0
04 Oct 2025
Flatness-Aware Stochastic Gradient Langevin Dynamics
Flatness-Aware Stochastic Gradient Langevin Dynamics
Stefano Bruno
Youngsik Hwang
Jaehyeon An
Sotirios Sabanis
Dong-Young Lim
176
0
0
02 Oct 2025
How Does Preconditioning Guide Feature Learning in Deep Neural Networks?
How Does Preconditioning Guide Feature Learning in Deep Neural Networks?
Kotaro Yoshida
Atsushi Nitanda
243
0
0
30 Sep 2025
Hybrid Dual-Batch and Cyclic Progressive Learning for Efficient Distributed Training
Hybrid Dual-Batch and Cyclic Progressive Learning for Efficient Distributed Training
Kuan-Wei Lu
Ding-Yong Hong
Pangfeng Liu
Jan-Jan Wu
124
0
0
30 Sep 2025
Sharpness of Minima in Deep Matrix Factorization: Exact Expressions
Sharpness of Minima in Deep Matrix Factorization: Exact Expressions
Anil Kamber
Rahul Parhi
FAtt
350
0
0
30 Sep 2025
Reconcile Certified Robustness and Accuracy for DNN-based Smoothed Majority Vote Classifier
Reconcile Certified Robustness and Accuracy for DNN-based Smoothed Majority Vote Classifier
Gaojie Jin
Xinping Yi
Xiaowei Huang
AAML
136
1
0
30 Sep 2025
Beyond Magic Words: Sharpness-Aware Prompt Evolving for Robust Large Language Models with TARE
Beyond Magic Words: Sharpness-Aware Prompt Evolving for Robust Large Language Models with TARE
Guancheng Wan
Lucheng Fu
Haoxin Liu
Yiqiao Jin
Hui Yi Leong
...
Yunpu Ma
Xiangru Tang
B. A. Prakash
Yizhou Sun
Wei Wang
KELM
155
0
0
28 Sep 2025
Dynamics of Learning: Generative Schedules from Latent ODEs
Dynamics of Learning: Generative Schedules from Latent ODEs
Matt L. Sampson
Peter Melchior
117
0
0
27 Sep 2025
Fine-tuning Done Right in Model Editing
Fine-tuning Done Right in Model Editing
Wanli Yang
Fei Sun
Rui Tang
Hongyu Zang
Du Su
Qi Cao
Jingang Wang
Huawei Shen
Xueqi Cheng
KELM
180
0
0
26 Sep 2025
Sharpness-Aware Minimization Can Hallucinate Minimizers
Sharpness-Aware Minimization Can Hallucinate Minimizers
Chanwoong Park
Uijeong Jang
Ernest K. Ryu
Insoon Yang
158
0
0
26 Sep 2025
TRUST: Test-Time Refinement using Uncertainty-Guided SSM Traverses
TRUST: Test-Time Refinement using Uncertainty-Guided SSM Traverses
Sahar Dastani
Ali Bahri
G. A. V. Hakim
Moslem Yazdanpanah
Mehrdad Noori
David Osowiechi
Samuel Barbeau
Ismail Ben Ayed
H. Lombaert
Christian Desrosiers
TTA
227
0
0
26 Sep 2025
A Unified Noise-Curvature View of Loss of Trainability
A Unified Noise-Curvature View of Loss of Trainability
Gunbir Singh Baveja
Alex Lewandowski
Mark Schmidt
184
0
0
24 Sep 2025
Flatness is Necessary, Neural Collapse is Not: Rethinking Generalization via Grokking
Flatness is Necessary, Neural Collapse is Not: Rethinking Generalization via Grokking
T. Han
Linara Adilova
Henning Petzka
Jens Kleesiek
Michael Kamp
244
1
0
22 Sep 2025
Neural Network Based Framework for Passive Intermodulation Cancellation in MIMO Systems
Neural Network Based Framework for Passive Intermodulation Cancellation in MIMO Systems
Xiaolong Li
Z. Xu
Peiting You
Yifei Zhu
162
0
0
21 Sep 2025
DIVEBATCH: Accelerating Model Training Through Gradient-Diversity Aware Batch Size Adaptation
DIVEBATCH: Accelerating Model Training Through Gradient-Diversity Aware Batch Size Adaptation
Yuen Chen
Yian Wang
Hari Sundaram
106
0
0
19 Sep 2025
MEC-Quant: Maximum Entropy Coding for Extremely Low Bit Quantization-Aware Training
MEC-Quant: Maximum Entropy Coding for Extremely Low Bit Quantization-Aware Training
Junbiao Pang
Tianyang Cai
Baochang Zhang
MQ
124
0
0
19 Sep 2025
1234...323334
Next