ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1609.04836
  4. Cited By
On Large-Batch Training for Deep Learning: Generalization Gap and Sharp
  Minima
v1v2 (latest)

On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima

15 September 2016
N. Keskar
Dheevatsa Mudigere
J. Nocedal
M. Smelyanskiy
P. T. P. Tang
    ODL
ArXiv (abs)PDFHTML

Papers citing "On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima"

50 / 1,554 papers shown
Title
DGSAM: Domain Generalization via Individual Sharpness-Aware Minimization
DGSAM: Domain Generalization via Individual Sharpness-Aware Minimization
Youngjun Song
Youngsik Hwang
Jonghun Lee
Heechang Lee
Dong-Young Lim
AAML
116
0
0
01 Jul 2025
Single-shot thermometry of simulated Bose--Einstein condensates using artificial intelligence
Single-shot thermometry of simulated Bose--Einstein condensates using artificial intelligence
Jack Griffiths
Steven A. Wrathmall
Simon A. Gardiner
9
0
0
20 Jun 2025
The Butterfly Effect: Neural Network Training Trajectories Are Highly Sensitive to Initial Conditions
The Butterfly Effect: Neural Network Training Trajectories Are Highly Sensitive to Initial Conditions
Devin Kwok
Gül Sena Altıntaş
Colin Raffel
David Rolnick
7
0
0
16 Jun 2025
From Sharpness to Better Generalization for Speech Deepfake Detection
From Sharpness to Better Generalization for Speech Deepfake Detection
Wen-Chin Huang
Xuechen Liu
Xin Eric Wang
Junichi Yamagishi
Yanmin Qian
9
0
0
13 Jun 2025
Generalization Bound of Gradient Flow through Training Trajectory and Data-dependent Kernel
Generalization Bound of Gradient Flow through Training Trajectory and Data-dependent Kernel
Yilan Chen
Zhichao Wang
Wei Huang
Andi Han
Taiji Suzuki
Arya Mazumdar
MLT
7
0
0
12 Jun 2025
FEDTAIL: Federated Long-Tailed Domain Generalization with Sharpness-Guided Gradient Matching
Sunny Gupta
Nikita Jangid
Shounak Das
Amit Sethi
FedML
21
0
0
10 Jun 2025
Promoting Ensemble Diversity with Interactive Bayesian Distributional Robustness for Fine-tuning Foundation Models
Promoting Ensemble Diversity with Interactive Bayesian Distributional Robustness for Fine-tuning Foundation Models
Ngoc-Quan Pham
Tuan Truong
Quyen Tran
T. H. Nguyen
Dinh Q. Phung
T. Le
23
1
0
08 Jun 2025
SAFE: Finding Sparse and Flat Minima to Improve Pruning
SAFE: Finding Sparse and Flat Minima to Improve Pruning
Dongyeop Lee
Kwanhee Lee
Jinseok Chung
Namhoon Lee
14
0
0
07 Jun 2025
Towards Better Generalization via Distributional Input Projection Network
Yifan Hao
Yanxin Lu
Xinwei Shen
Tong Zhang
90
0
0
05 Jun 2025
Temporal horizons in forecasting: a performance-learnability trade-off
Temporal horizons in forecasting: a performance-learnability trade-off
Pau Vilimelis Aceituno
Jack William Miller
Noah Marti
Youssef Farag
Victor Boussange
AI4TS
111
0
0
04 Jun 2025
scDataset: Scalable Data Loading for Deep Learning on Large-Scale Single-Cell Omics
scDataset: Scalable Data Loading for Deep Learning on Large-Scale Single-Cell Omics
Davide DÁscenzo
Sebastiano Cultrera di Montesano
48
0
0
02 Jun 2025
LightSAM: Parameter-Agnostic Sharpness-Aware Minimization
LightSAM: Parameter-Agnostic Sharpness-Aware Minimization
Yifei Cheng
Li Shen
Hao Sun
Nan Yin
Xiaochun Cao
Enhong Chen
AAML
23
0
0
30 May 2025
GradPower: Powering Gradients for Faster Language Model Pre-Training
GradPower: Powering Gradients for Faster Language Model Pre-Training
Mingze Wang
Jinbo Wang
Jiaqi Zhang
Wei Wang
Peng Pei
Xunliang Cai
Weinan E
Lei Wu
41
0
0
30 May 2025
Towards Understanding The Calibration Benefits of Sharpness-Aware Minimization
Towards Understanding The Calibration Benefits of Sharpness-Aware Minimization
C. Tan
Yubo Zhou
Haishan Ye
Guang Dai
Junmin Liu
Zengjie Song
Jiangshe Zhang
Zixiang Zhao
Yunda Hao
Yong Xu
AAML
18
0
0
29 May 2025
Dynamic Spectral Backpropagation for Efficient Neural Network Training
Dynamic Spectral Backpropagation for Efficient Neural Network Training
Mannmohan Muthuraman
103
0
0
29 May 2025
One-Time Soft Alignment Enables Resilient Learning without Weight Transport
One-Time Soft Alignment Enables Resilient Learning without Weight Transport
Jeonghwan Cheon
Jaehyuk Bae
Se-Bum Paik
ODL
41
1
0
27 May 2025
Convergence, Sticking and Escape: Stochastic Dynamics Near Critical Points in SGD
Convergence, Sticking and Escape: Stochastic Dynamics Near Critical Points in SGD
Dmitry Dudukalov
Artem Logachov
Vladimir Lotov
Timofei Prasolov
Evgeny Prokopenko
Anton Tarasenko
61
0
0
24 May 2025
Skip-Thinking: Chunk-wise Chain-of-Thought Distillation Enable Smaller Language Models to Reason Better and Faster
Skip-Thinking: Chunk-wise Chain-of-Thought Distillation Enable Smaller Language Models to Reason Better and Faster
Xiao Chen
Sihang Zhou
K. Liang
Xiaoyu Sun
Xinwang Liu
LRM
20
1
0
24 May 2025
TRACE for Tracking the Emergence of Semantic Representations in Transformers
TRACE for Tracking the Emergence of Semantic Representations in Transformers
Nura Aljaafari
Danilo S. Carvalho
André Freitas
87
0
0
23 May 2025
Accidental Misalignment: Fine-Tuning Language Models Induces Unexpected Vulnerability
Accidental Misalignment: Fine-Tuning Language Models Induces Unexpected Vulnerability
Punya Syon Pandey
Samuel Simko
Kellin Pelrine
Zhijing Jin
AAML
39
0
0
22 May 2025
DeepKD: A Deeply Decoupled and Denoised Knowledge Distillation Trainer
DeepKD: A Deeply Decoupled and Denoised Knowledge Distillation Trainer
Haiduo Huang
Jiangcheng Song
Yadong Zhang
Pengju Ren
62
0
0
21 May 2025
Revealing Language Model Trajectories via Kullback-Leibler Divergence
Revealing Language Model Trajectories via Kullback-Leibler Divergence
Ryo Kishino
Yusuke Takase
Momose Oyama
Hiroaki Yamagiwa
Hidetoshi Shimodaira
72
0
0
21 May 2025
Intra-class Patch Swap for Self-Distillation
Intra-class Patch Swap for Self-Distillation
Hongjun Choi
Eun Som Jeon
Ankita Shukla
Pavan Turaga
64
0
0
20 May 2025
Incorporating brain-inspired mechanisms for multimodal learning in artificial intelligence
Xiang He
Dongcheng Zhao
Yang Li
Qingqun Kong
Xin Yang
Yi Zeng
82
0
0
15 May 2025
Uniform Loss vs. Specialized Optimization: A Comparative Analysis in Multi-Task Learning
Uniform Loss vs. Specialized Optimization: A Comparative Analysis in Multi-Task Learning
Gabriel S. Gama
Valdir Grassi Jr
MoMe
101
0
0
15 May 2025
Block-Biased Mamba for Long-Range Sequence Processing
Block-Biased Mamba for Long-Range Sequence Processing
Annan Yu
N. Benjamin Erichson
Mamba
103
0
0
13 May 2025
Learning from Loss Landscape: Generalizable Mixed-Precision Quantization via Adaptive Sharpness-Aware Gradient Aligning
Learning from Loss Landscape: Generalizable Mixed-Precision Quantization via Adaptive Sharpness-Aware Gradient Aligning
Lianbo Ma
Jianlun Ma
Yuee Zhou
Guoyang Xie
Qiang He
Zhichao Lu
MQ
97
0
0
08 May 2025
Towards Quantifying the Hessian Structure of Neural Networks
Towards Quantifying the Hessian Structure of Neural Networks
Zhaorui Dong
Yushun Zhang
Zhi-Quan Luo
Jianfeng Yao
Ruoyu Sun
69
1
0
05 May 2025
Sharpness-Aware Minimization with Z-Score Gradient Filtering for Neural Networks
Sharpness-Aware Minimization with Z-Score Gradient Filtering for Neural Networks
Juyoung Yun
193
0
0
05 May 2025
Focal-SAM: Focal Sharpness-Aware Minimization for Long-Tailed Classification
Focal-SAM: Focal Sharpness-Aware Minimization for Long-Tailed Classification
Sicong Li
Qianqian Xu
Zhiyong Yang
Zitai Wang
Li Zhang
Xiaochun Cao
Qingming Huang
136
0
0
03 May 2025
Plant Disease Detection through Multimodal Large Language Models and Convolutional Neural Networks
Plant Disease Detection through Multimodal Large Language Models and Convolutional Neural Networks
Konstantinos I. Roumeliotis
Ranjan Sapkota
Manoj Karkee
Nikolaos D. Tselikas
Dimitrios K. Nasiopoulos
71
1
0
29 Apr 2025
Reinforcement Learning for Reasoning in Large Language Models with One Training Example
Reinforcement Learning for Reasoning in Large Language Models with One Training Example
Yiping Wang
Qing Yang
Zhiyuan Zeng
Liliang Ren
Liu Liu
...
Jianfeng Gao
Weizhu Chen
Shuaiqiang Wang
Simon Shaolei Du
Yelong Shen
OffRLReLMLRM
323
47
0
29 Apr 2025
FusionNet: Multi-model Linear Fusion Framework for Low-light Image Enhancement
FusionNet: Multi-model Linear Fusion Framework for Low-light Image Enhancement
Kangbiao Shi
Yixu Feng
Tao Hu
Yu Cao
Peng Wu
Yijin Liang
Y. Zhang
Qingsen Yan
69
0
0
27 Apr 2025
The effect of the number of parameters and the number of local feature patches on loss landscapes in distributed quantum neural networks
The effect of the number of parameters and the number of local feature patches on loss landscapes in distributed quantum neural networks
Yoshiaki Kawase
120
0
0
27 Apr 2025
Pseudo-Asynchronous Local SGD: Robust and Efficient Data-Parallel Training
Pseudo-Asynchronous Local SGD: Robust and Efficient Data-Parallel Training
Hiroki Naganuma
Xinzhi Zhang
Man-Chung Yue
Ioannis Mitliagkas
Philipp A. Witte
Russell J. Hewett
Yin Tat Lee
248
0
0
25 Apr 2025
Seeking Flat Minima over Diverse Surrogates for Improved Adversarial Transferability: A Theoretical Framework and Algorithmic Instantiation
Seeking Flat Minima over Diverse Surrogates for Improved Adversarial Transferability: A Theoretical Framework and Algorithmic Instantiation
Meixi Zheng
Kehan Wu
Yanbo Fan
Rui Huang
Baoyuan Wu
AAML
71
0
0
23 Apr 2025
Param$Δ$ for Direct Weight Mixing: Post-Train Large Language Model at Zero Cost
ParamΔΔΔ for Direct Weight Mixing: Post-Train Large Language Model at Zero Cost
Sheng Cao
Mingrui Wu
Karthik Prasad
Yuandong Tian
Zechun Liu
MoMe
135
0
0
23 Apr 2025
VeLU: Variance-enhanced Learning Unit for Deep Neural Networks
VeLU: Variance-enhanced Learning Unit for Deep Neural Networks
Ashkan Shakarami
Yousef Yeganeh
Azade Farshad
Lorenzo Nicolè
Stefano Ghidoni
Nassir Navab
92
0
0
21 Apr 2025
How Effective Can Dropout Be in Multiple Instance Learning ?
How Effective Can Dropout Be in Multiple Instance Learning ?
Wenhui Zhu
Peijie Qiu
Xiwen Chen
Zhangsihao Yang
Aristeidis Sotiras
Abolfazl Razi
Yanjie Wang
91
0
0
21 Apr 2025
Dueling Deep Reinforcement Learning for Financial Time Series
Dueling Deep Reinforcement Learning for Financial Time Series
Bruno Giorgio
AIFinAI4TS
52
0
0
15 Apr 2025
An overview of condensation phenomenon in deep learning
An overview of condensation phenomenon in deep learning
Zhi-Qin John Xu
Yaoyu Zhang
Zhangchen Zhou
AI4CE
66
4
0
13 Apr 2025
Sharpness-Aware Parameter Selection for Machine Unlearning
Sharpness-Aware Parameter Selection for Machine Unlearning
Saber Malekmohammadi
Hong kyu Lee
Li Xiong
MU
508
0
0
08 Apr 2025
Scaling Graph Neural Networks for Particle Track Reconstruction
Scaling Graph Neural Networks for Particle Track Reconstruction
Alok Tripathy
A. Lazar
X. Ju
P. Calafiura
Katherine Yelick
A. Buluç
79
0
0
07 Apr 2025
Randomised Splitting Methods and Stochastic Gradient Descent
Randomised Splitting Methods and Stochastic Gradient Descent
Luke Shaw
Peter A. Whalley
101
1
0
05 Apr 2025
v-CLR: View-Consistent Learning for Open-World Instance Segmentation
v-CLR: View-Consistent Learning for Open-World Instance Segmentation
Chang-Bin Zhang
Jinhong Ni
Yujie Zhong
Kai Han
3DVVLM
177
0
0
02 Apr 2025
Hessian-aware Training for Enhancing DNNs Resilience to Parameter Corruptions
Hessian-aware Training for Enhancing DNNs Resilience to Parameter Corruptions
Tahmid Hasan Prato
Seijoon Kim
Lizhong Chen
Sanghyun Hong
AAML
98
0
0
02 Apr 2025
Identifying Sparsely Active Circuits Through Local Loss Landscape Decomposition
Identifying Sparsely Active Circuits Through Local Loss Landscape Decomposition
Brianna Chrisman
Lucius Bushnaq
Lee D. Sharkey
77
0
0
31 Mar 2025
Efficient Token Compression for Vision Transformer with Spatial Information Preserved
Efficient Token Compression for Vision Transformer with Spatial Information Preserved
Junzhu Mao
Yang Shen
Jinyang Guo
Yazhou Yao
Xiansheng Hua
ViT
136
0
0
30 Mar 2025
OmniLearn: A Framework for Distributed Deep Learning over Heterogeneous Clusters
OmniLearn: A Framework for Distributed Deep Learning over Heterogeneous Clusters
S. Tyagi
Prateek Sharma
138
0
0
21 Mar 2025
Layer-wise Adaptive Gradient Norm Penalizing Method for Efficient and Accurate Deep Learning
Layer-wise Adaptive Gradient Norm Penalizing Method for Efficient and Accurate Deep Learning
Sunwoo Lee
139
0
0
18 Mar 2025
1234...303132
Next