ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1609.04836
  4. Cited By
On Large-Batch Training for Deep Learning: Generalization Gap and Sharp
  Minima
v1v2 (latest)

On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima

15 September 2016
N. Keskar
Dheevatsa Mudigere
J. Nocedal
M. Smelyanskiy
P. T. P. Tang
    ODL
ArXiv (abs)PDFHTML

Papers citing "On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima"

50 / 1,653 papers shown
Title
Temporal horizons in forecasting: a performance-learnability trade-off
Temporal horizons in forecasting: a performance-learnability trade-off
Pau Vilimelis Aceituno
Jack William Miller
Noah Marti
Youssef Farag
Victor Boussange
AI4TS
337
1
0
04 Jun 2025
scDataset: Scalable Data Loading for Deep Learning on Large-Scale Single-Cell Omics
scDataset: Scalable Data Loading for Deep Learning on Large-Scale Single-Cell Omics
Davide DÁscenzo
Sebastiano Cultrera di Montesano
227
1
0
02 Jun 2025
GradPower: Powering Gradients for Faster Language Model Pre-Training
GradPower: Powering Gradients for Faster Language Model Pre-Training
Mingze Wang
Jinbo Wang
Jiaqi Zhang
Wei Wang
Peng Pei
Xunliang Cai
Weinan E
Lei Wu
201
0
0
30 May 2025
LightSAM: Parameter-Agnostic Sharpness-Aware Minimization
LightSAM: Parameter-Agnostic Sharpness-Aware Minimization
Yifei Cheng
Li Shen
Hao Sun
Nan Yin
Xiaochun Cao
Enhong Chen
AAML
210
0
0
30 May 2025
Towards Understanding The Calibration Benefits of Sharpness-Aware Minimization
Towards Understanding The Calibration Benefits of Sharpness-Aware Minimization
C. Tan
Yubo Zhou
Haishan Ye
Guang Dai
Junmin Liu
Zengjie Song
Jiangshe Zhang
Zixiang Zhao
Yunda Hao
Yong Xu
AAML
246
0
0
29 May 2025
Dynamic Spectral Backpropagation for Efficient Neural Network Training
Dynamic Spectral Backpropagation for Efficient Neural Network Training
Mannmohan Muthuraman
282
0
0
29 May 2025
One-Time Soft Alignment Enables Resilient Learning without Weight Transport
One-Time Soft Alignment Enables Resilient Learning without Weight Transport
Jeonghwan Cheon
Jaehyuk Bae
Se-Bum Paik
ODL
354
2
0
27 May 2025
Skip-Thinking: Chunk-wise Chain-of-Thought Distillation Enable Smaller Language Models to Reason Better and Faster
Skip-Thinking: Chunk-wise Chain-of-Thought Distillation Enable Smaller Language Models to Reason Better and Faster
Xiao Chen
Sihang Zhou
K. Liang
Xiaoyu Sun
Xinwang Liu
LRM
203
4
0
24 May 2025
Convergence, Sticking and Escape: Stochastic Dynamics Near Critical Points in SGD
Convergence, Sticking and Escape: Stochastic Dynamics Near Critical Points in SGD
Dmitry Dudukalov
Artem Logachov
Vladimir Lotov
Timofei Prasolov
Evgeny Prokopenko
Anton Tarasenko
169
0
0
24 May 2025
TRACE for Tracking the Emergence of Semantic Representations in Transformers
TRACE for Tracking the Emergence of Semantic Representations in Transformers
Nura Aljaafari
Danilo S. Carvalho
André Freitas
208
0
0
23 May 2025
Accidental Vulnerability: Factors in Fine-Tuning that Shift Model Safeguards
Accidental Vulnerability: Factors in Fine-Tuning that Shift Model Safeguards
Punya Syon Pandey
Samuel Simko
Kellin Pelrine
Zhijing Jin
AAML
207
0
0
22 May 2025
DeepKD: A Deeply Decoupled and Denoised Knowledge Distillation Trainer
DeepKD: A Deeply Decoupled and Denoised Knowledge Distillation Trainer
Haiduo Huang
Jiangcheng Song
Yadong Zhang
Pengju Ren
287
0
0
21 May 2025
Revealing Language Model Trajectories via Kullback-Leibler Divergence
Revealing Language Model Trajectories via Kullback-Leibler Divergence
Ryo Kishino
Yusuke Takase
Momose Oyama
Hiroaki Yamagiwa
Hidetoshi Shimodaira
256
0
0
21 May 2025
Intra-class Patch Swap for Self-Distillation
Intra-class Patch Swap for Self-Distillation
Hongjun Choi
Eun Som Jeon
Ankita Shukla
Pavan Turaga
247
0
0
20 May 2025
Incorporating brain-inspired mechanisms for multimodal learning in artificial intelligence
Xiang He
Dongcheng Zhao
Yang Li
Qingqun Kong
Xin Yang
Yi Zeng
286
0
0
15 May 2025
Uniform Loss vs. Specialized Optimization: A Comparative Analysis in Multi-Task Learning
Uniform Loss vs. Specialized Optimization: A Comparative Analysis in Multi-Task Learning
Gabriel S. Gama
Valdir Grassi Jr
MoMe
265
0
0
15 May 2025
Efficient Unstructured Pruning of Mamba State-Space Models for Resource-Constrained Environments
Efficient Unstructured Pruning of Mamba State-Space Models for Resource-Constrained Environments
Ibne Farabi Shihab
Sanjeda Akter
Anuj Sharma
Mamba
394
3
0
13 May 2025
Block-Biased Mamba for Long-Range Sequence Processing
Block-Biased Mamba for Long-Range Sequence Processing
Annan Yu
N. Benjamin Erichson
Mamba
303
2
0
13 May 2025
Learning from Loss Landscape: Generalizable Mixed-Precision Quantization via Adaptive Sharpness-Aware Gradient Aligning
Learning from Loss Landscape: Generalizable Mixed-Precision Quantization via Adaptive Sharpness-Aware Gradient Aligning
Lianbo Ma
Jianlun Ma
Yuee Zhou
Guoyang Xie
Qiang He
Zhichao Lu
MQ
286
2
0
08 May 2025
Towards Quantifying the Hessian Structure of Neural Networks
Towards Quantifying the Hessian Structure of Neural Networks
Zhaorui Dong
Yushun Zhang
Jianfeng Yao
Jianfeng Yao
270
2
0
05 May 2025
Sharpness-Aware Minimization with Z-Score Gradient Filtering
Sharpness-Aware Minimization with Z-Score Gradient Filtering
Juyoung Yun
573
0
0
05 May 2025
Focal-SAM: Focal Sharpness-Aware Minimization for Long-Tailed Classification
Focal-SAM: Focal Sharpness-Aware Minimization for Long-Tailed Classification
Sicong Li
Qianqian Xu
Zhiyong Yang
Zitai Wang
Li Zhang
Xiaochun Cao
Qingming Huang
385
3
0
03 May 2025
Reinforcement Learning for Reasoning in Large Language Models with One Training Example
Reinforcement Learning for Reasoning in Large Language Models with One Training Example
Yiping Wang
Qing Yang
Zhiyuan Zeng
Liliang Ren
Liu Liu
...
Jianfeng Gao
Weizhu Chen
Shuaiqiang Wang
Simon Shaolei Du
Haoran Pan
OffRLReLMLRM
732
152
0
29 Apr 2025
Plant Disease Detection through Multimodal Large Language Models and Convolutional Neural Networks
Plant Disease Detection through Multimodal Large Language Models and Convolutional Neural Networks
Konstantinos I. Roumeliotis
Ranjan Sapkota
Manoj Karkee
Nikolaos D. Tselikas
Dimitrios K. Nasiopoulos
303
3
0
29 Apr 2025
FusionNet: Multi-model Linear Fusion Framework for Low-light Image Enhancement
FusionNet: Multi-model Linear Fusion Framework for Low-light Image Enhancement
Kangbiao Shi
Yixu Feng
Tao Hu
Yu Cao
Peng Wu
Yijin Liang
Y. Zhang
Qingsen Yan
306
1
0
27 Apr 2025
The effect of the number of parameters and the number of local feature patches on loss landscapes in distributed quantum neural networks
The effect of the number of parameters and the number of local feature patches on loss landscapes in distributed quantum neural networks
Yoshiaki Kawase
221
0
0
27 Apr 2025
Pseudo-Asynchronous Local SGD: Robust and Efficient Data-Parallel Training
Pseudo-Asynchronous Local SGD: Robust and Efficient Data-Parallel Training
Hiroki Naganuma
Xinzhi Zhang
Man-Chung Yue
Ioannis Mitliagkas
Philipp A. Witte
Russell J. Hewett
Yin Tat Lee
440
1
0
25 Apr 2025
Param$Δ$ for Direct Weight Mixing: Post-Train Large Language Model at Zero Cost
ParamΔΔΔ for Direct Weight Mixing: Post-Train Large Language Model at Zero Cost
Sheng Cao
Mingrui Wu
Karthik Prasad
Yuandong Tian
Zechun Liu
MoMe
287
0
0
23 Apr 2025
Seeking Flat Minima over Diverse Surrogates for Improved Adversarial Transferability: A Theoretical Framework and Algorithmic Instantiation
Seeking Flat Minima over Diverse Surrogates for Improved Adversarial Transferability: A Theoretical Framework and Algorithmic Instantiation
Meixi Zheng
Kehan Wu
Yanbo Fan
Rui Huang
Baoyuan Wu
AAML
199
0
0
23 Apr 2025
How Effective Can Dropout Be in Multiple Instance Learning ?
How Effective Can Dropout Be in Multiple Instance Learning ?
Wenhui Zhu
Peijie Qiu
Xiwen Chen
Zhangsihao Yang
Aristeidis Sotiras
Abolfazl Razi
Yanjie Wang
393
2
0
21 Apr 2025
VeLU: Variance-enhanced Learning Unit for Deep Neural Networks
VeLU: Variance-enhanced Learning Unit for Deep Neural Networks
Ashkan Shakarami
Yousef Yeganeh
Azade Farshad
Lorenzo Nicolè
Stefano Ghidoni
Nassir Navab
272
2
0
21 Apr 2025
Dueling Deep Reinforcement Learning for Financial Time Series
Dueling Deep Reinforcement Learning for Financial Time Series
Bruno Giorgio
AIFinAI4TS
170
0
0
15 Apr 2025
An overview of condensation phenomenon in deep learning
An overview of condensation phenomenon in deep learning
Zhi-Qin John Xu
Yaoyu Zhang
Zhangchen Zhou
AI4CE
214
11
0
13 Apr 2025
Sharpness-Aware Parameter Selection for Machine Unlearning
Sharpness-Aware Parameter Selection for Machine Unlearning
Saber Malekmohammadi
Hong kyu Lee
Li Xiong
MU
1.0K
0
0
08 Apr 2025
Scaling Graph Neural Networks for Particle Track Reconstruction
Scaling Graph Neural Networks for Particle Track ReconstructionIEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum (IPDPS), 2025
Alok Tripathy
A. Lazar
X. Ju
P. Calafiura
Katherine Yelick
A. Buluç
225
1
0
07 Apr 2025
Randomised Splitting Methods and Stochastic Gradient Descent
Randomised Splitting Methods and Stochastic Gradient Descent
Luke Shaw
Peter A. Whalley
257
2
0
05 Apr 2025
v-CLR: View-Consistent Learning for Open-World Instance Segmentation
v-CLR: View-Consistent Learning for Open-World Instance SegmentationComputer Vision and Pattern Recognition (CVPR), 2025
Chang-Bin Zhang
Jinhong Ni
Yujie Zhong
Kai Han
3DVVLM
418
2
0
02 Apr 2025
Hessian-aware Training for Enhancing DNNs Resilience to Parameter Corruptions
Hessian-aware Training for Enhancing DNNs Resilience to Parameter Corruptions
Tahmid Hasan Prato
Seijoon Kim
Lizhong Chen
Sanghyun Hong
AAML
281
1
0
02 Apr 2025
Identifying Sparsely Active Circuits Through Local Loss Landscape Decomposition
Identifying Sparsely Active Circuits Through Local Loss Landscape Decomposition
Brianna Chrisman
Lucius Bushnaq
Lee D. Sharkey
294
2
0
31 Mar 2025
Efficient Token Compression for Vision Transformer with Spatial Information Preserved
Efficient Token Compression for Vision Transformer with Spatial Information Preserved
Junzhu Mao
Yang Shen
Jinyang Guo
Yazhou Yao
Xiansheng Hua
ViT
311
2
0
30 Mar 2025
OmniLearn: A Framework for Distributed Deep Learning over Heterogeneous Clusters
OmniLearn: A Framework for Distributed Deep Learning over Heterogeneous ClustersIEEE Transactions on Parallel and Distributed Systems (TPDS), 2025
S. Tyagi
Prateek Sharma
360
2
0
21 Mar 2025
Layer-wise Adaptive Gradient Norm Penalizing Method for Efficient and Accurate Deep Learning
Layer-wise Adaptive Gradient Norm Penalizing Method for Efficient and Accurate Deep LearningKnowledge Discovery and Data Mining (KDD), 2024
Sunwoo Lee
293
2
0
18 Mar 2025
Gradient Extrapolation for Debiased Representation Learning
Gradient Extrapolation for Debiased Representation Learning
Ihab Asaad
M. Shadaydeh
Joachim Denzler
302
1
0
17 Mar 2025
High-entropy Advantage in Neural Networks' Generalizability
High-entropy Advantage in Neural Networks' Generalizability
Entao Yang
Wei Wei
Yue Shang
Ge Zhang
AI4CE
353
2
0
17 Mar 2025
Layer-wise Update Aggregation with Recycling for Communication-Efficient Federated Learning
Jisoo Kim
Sungmin Kang
Sunwoo Lee
FedML
171
1
0
14 Mar 2025
Stabilizing Quantization-Aware Training by Implicit-Regularization on Hessian Matrix
Junbiao Pang
Tianyang Cai
318
1
0
14 Mar 2025
Analyzing the Role of Permutation Invariance in Linear Mode ConnectivityInternational Conference on Artificial Intelligence and Statistics (AISTATS), 2025
Keyao Zhan
Puheng Li
Lei Wu
MoMe
285
1
0
13 Mar 2025
SplatPose: Geometry-Aware 6-DoF Pose Estimation from Single RGB Image via 3D Gaussian Splatting
Linqi Yang
Xiongwei Zhao
Qihao Sun
Ke Wang
Ao Chen
Peng Kang
3DGS
291
9
0
07 Mar 2025
Sharpness-Aware Minimization: General Analysis and Improved RatesInternational Conference on Learning Representations (ICLR), 2025
Dimitris Oikonomou
Nicolas Loizou
260
7
0
04 Mar 2025
Deep Learning is Not So Mysterious or Different
Deep Learning is Not So Mysterious or Different
Andrew Gordon Wilson
313
23
0
03 Mar 2025
Previous
123456...323334
Next