ResearchTrend.AI
  • Papers
  • Communities
  • Organizations
  • Events
  • Blog
  • Pricing
  • Feedback
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1609.04836
  4. Cited By
On Large-Batch Training for Deep Learning: Generalization Gap and Sharp
  Minima
v1v2 (latest)

On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima

15 September 2016
N. Keskar
Dheevatsa Mudigere
J. Nocedal
M. Smelyanskiy
P. T. P. Tang
    ODL
ArXiv (abs)PDFHTML

Papers citing "On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima"

50 / 1,585 papers shown
Title
Uniform Loss vs. Specialized Optimization: A Comparative Analysis in Multi-Task Learning
Uniform Loss vs. Specialized Optimization: A Comparative Analysis in Multi-Task Learning
Gabriel S. Gama
Valdir Grassi Jr
MoMe
159
0
0
15 May 2025
Incorporating brain-inspired mechanisms for multimodal learning in artificial intelligence
Xiang He
Dongcheng Zhao
Yang Li
Qingqun Kong
Xin Yang
Yi Zeng
130
0
0
15 May 2025
Efficient Unstructured Pruning of Mamba State-Space Models for Resource-Constrained Environments
Efficient Unstructured Pruning of Mamba State-Space Models for Resource-Constrained Environments
Ibne Farabi Shihab
Sanjeda Akter
Anuj Sharma
Mamba
202
1
0
13 May 2025
Block-Biased Mamba for Long-Range Sequence Processing
Block-Biased Mamba for Long-Range Sequence Processing
Annan Yu
N. Benjamin Erichson
Mamba
155
1
0
13 May 2025
Learning from Loss Landscape: Generalizable Mixed-Precision Quantization via Adaptive Sharpness-Aware Gradient Aligning
Learning from Loss Landscape: Generalizable Mixed-Precision Quantization via Adaptive Sharpness-Aware Gradient Aligning
Lianbo Ma
Jianlun Ma
Yuee Zhou
Guoyang Xie
Qiang He
Zhichao Lu
MQ
134
0
0
08 May 2025
Sharpness-Aware Minimization with Z-Score Gradient Filtering
Sharpness-Aware Minimization with Z-Score Gradient Filtering
Juyoung Yun
389
0
0
05 May 2025
Towards Quantifying the Hessian Structure of Neural Networks
Towards Quantifying the Hessian Structure of Neural Networks
Zhaorui Dong
Yushun Zhang
Zhi-Quan Luo
Jianfeng Yao
Ruoyu Sun
109
1
0
05 May 2025
Focal-SAM: Focal Sharpness-Aware Minimization for Long-Tailed Classification
Focal-SAM: Focal Sharpness-Aware Minimization for Long-Tailed Classification
Sicong Li
Qianqian Xu
Zhiyong Yang
Zitai Wang
Li Zhang
Xiaochun Cao
Qingming Huang
202
0
0
03 May 2025
Plant Disease Detection through Multimodal Large Language Models and Convolutional Neural Networks
Plant Disease Detection through Multimodal Large Language Models and Convolutional Neural Networks
Konstantinos I. Roumeliotis
Ranjan Sapkota
Manoj Karkee
Nikolaos D. Tselikas
Dimitrios K. Nasiopoulos
130
1
0
29 Apr 2025
Reinforcement Learning for Reasoning in Large Language Models with One Training Example
Reinforcement Learning for Reasoning in Large Language Models with One Training Example
Yiping Wang
Qing Yang
Zhiyuan Zeng
Liliang Ren
Liu Liu
...
Jianfeng Gao
Weizhu Chen
Shuaiqiang Wang
Simon Shaolei Du
Yelong Shen
OffRLReLMLRM
425
90
0
29 Apr 2025
The effect of the number of parameters and the number of local feature patches on loss landscapes in distributed quantum neural networks
The effect of the number of parameters and the number of local feature patches on loss landscapes in distributed quantum neural networks
Yoshiaki Kawase
156
0
0
27 Apr 2025
FusionNet: Multi-model Linear Fusion Framework for Low-light Image Enhancement
FusionNet: Multi-model Linear Fusion Framework for Low-light Image Enhancement
Kangbiao Shi
Yixu Feng
Tao Hu
Yu Cao
Peng Wu
Yijin Liang
Y. Zhang
Qingsen Yan
123
0
0
27 Apr 2025
Pseudo-Asynchronous Local SGD: Robust and Efficient Data-Parallel Training
Pseudo-Asynchronous Local SGD: Robust and Efficient Data-Parallel Training
Hiroki Naganuma
Xinzhi Zhang
Man-Chung Yue
Ioannis Mitliagkas
Philipp A. Witte
Russell J. Hewett
Yin Tat Lee
299
0
0
25 Apr 2025
Param$Δ$ for Direct Weight Mixing: Post-Train Large Language Model at Zero Cost
ParamΔΔΔ for Direct Weight Mixing: Post-Train Large Language Model at Zero Cost
Sheng Cao
Mingrui Wu
Karthik Prasad
Yuandong Tian
Zechun Liu
MoMe
179
0
0
23 Apr 2025
Seeking Flat Minima over Diverse Surrogates for Improved Adversarial Transferability: A Theoretical Framework and Algorithmic Instantiation
Seeking Flat Minima over Diverse Surrogates for Improved Adversarial Transferability: A Theoretical Framework and Algorithmic Instantiation
Meixi Zheng
Kehan Wu
Yanbo Fan
Rui Huang
Baoyuan Wu
AAML
99
0
0
23 Apr 2025
How Effective Can Dropout Be in Multiple Instance Learning ?
How Effective Can Dropout Be in Multiple Instance Learning ?
Wenhui Zhu
Peijie Qiu
Xiwen Chen
Zhangsihao Yang
Aristeidis Sotiras
Abolfazl Razi
Yanjie Wang
221
2
0
21 Apr 2025
VeLU: Variance-enhanced Learning Unit for Deep Neural Networks
VeLU: Variance-enhanced Learning Unit for Deep Neural Networks
Ashkan Shakarami
Yousef Yeganeh
Azade Farshad
Lorenzo Nicolè
Stefano Ghidoni
Nassir Navab
143
1
0
21 Apr 2025
Dueling Deep Reinforcement Learning for Financial Time Series
Dueling Deep Reinforcement Learning for Financial Time Series
Bruno Giorgio
AIFinAI4TS
102
0
0
15 Apr 2025
An overview of condensation phenomenon in deep learning
An overview of condensation phenomenon in deep learning
Zhi-Qin John Xu
Yaoyu Zhang
Zhangchen Zhou
AI4CE
100
5
0
13 Apr 2025
Sharpness-Aware Parameter Selection for Machine Unlearning
Sharpness-Aware Parameter Selection for Machine Unlearning
Saber Malekmohammadi
Hong kyu Lee
Li Xiong
MU
651
0
0
08 Apr 2025
Scaling Graph Neural Networks for Particle Track Reconstruction
Scaling Graph Neural Networks for Particle Track Reconstruction
Alok Tripathy
A. Lazar
X. Ju
P. Calafiura
Katherine Yelick
A. Buluç
117
0
0
07 Apr 2025
Randomised Splitting Methods and Stochastic Gradient Descent
Randomised Splitting Methods and Stochastic Gradient Descent
Luke Shaw
Peter A. Whalley
133
1
0
05 Apr 2025
Hessian-aware Training for Enhancing DNNs Resilience to Parameter Corruptions
Hessian-aware Training for Enhancing DNNs Resilience to Parameter Corruptions
Tahmid Hasan Prato
Seijoon Kim
Lizhong Chen
Sanghyun Hong
AAML
133
0
0
02 Apr 2025
v-CLR: View-Consistent Learning for Open-World Instance Segmentation
v-CLR: View-Consistent Learning for Open-World Instance Segmentation
Chang-Bin Zhang
Jinhong Ni
Yujie Zhong
Kai Han
3DVVLM
206
2
0
02 Apr 2025
Identifying Sparsely Active Circuits Through Local Loss Landscape Decomposition
Identifying Sparsely Active Circuits Through Local Loss Landscape Decomposition
Brianna Chrisman
Lucius Bushnaq
Lee D. Sharkey
106
1
0
31 Mar 2025
Efficient Token Compression for Vision Transformer with Spatial Information Preserved
Efficient Token Compression for Vision Transformer with Spatial Information Preserved
Junzhu Mao
Yang Shen
Jinyang Guo
Yazhou Yao
Xiansheng Hua
ViT
179
0
0
30 Mar 2025
OmniLearn: A Framework for Distributed Deep Learning over Heterogeneous Clusters
OmniLearn: A Framework for Distributed Deep Learning over Heterogeneous Clusters
S. Tyagi
Prateek Sharma
176
0
0
21 Mar 2025
Layer-wise Adaptive Gradient Norm Penalizing Method for Efficient and Accurate Deep Learning
Layer-wise Adaptive Gradient Norm Penalizing Method for Efficient and Accurate Deep Learning
Sunwoo Lee
177
1
0
18 Mar 2025
High-entropy Advantage in Neural Networks' Generalizability
High-entropy Advantage in Neural Networks' Generalizability
Entao Yang
Wei Wei
Yue Shang
Ge Zhang
AI4CE
153
0
0
17 Mar 2025
Gradient Extrapolation for Debiased Representation Learning
Gradient Extrapolation for Debiased Representation Learning
Ihab Asaad
M. Shadaydeh
Joachim Denzler
117
0
0
17 Mar 2025
Layer-wise Update Aggregation with Recycling for Communication-Efficient Federated Learning
Jisoo Kim
Sungmin Kang
Sunwoo Lee
FedML
87
0
0
14 Mar 2025
Stabilizing Quantization-Aware Training by Implicit-Regularization on Hessian Matrix
Junbiao Pang
Tianyang Cai
165
1
0
14 Mar 2025
Analyzing the Role of Permutation Invariance in Linear Mode Connectivity
Keyao Zhan
Puheng Li
Lei Wu
MoMe
145
0
0
13 Mar 2025
SplatPose: Geometry-Aware 6-DoF Pose Estimation from Single RGB Image via 3D Gaussian Splatting
Linqi Yang
Xiongwei Zhao
Qihao Sun
Ke Wang
Ao Chen
Peng Kang
3DGS
163
2
0
07 Mar 2025
Sharpness-Aware Minimization: General Analysis and Improved Rates
Dimitris Oikonomou
Nicolas Loizou
131
3
0
04 Mar 2025
Deep Learning is Not So Mysterious or Different
Deep Learning is Not So Mysterious or Different
Andrew Gordon Wilson
134
10
0
03 Mar 2025
Communication-Efficient Device Scheduling for Federated Learning Using Lyapunov Optimization
Jake B. Perazzone
Maroun Touma
Mingyue Ji
Kevin S. Chan
FedML
179
0
0
01 Mar 2025
LORENZA: Enhancing Generalization in Low-Rank Gradient LLM Training via Efficient Zeroth-Order Adaptive SAM
LORENZA: Enhancing Generalization in Low-Rank Gradient LLM Training via Efficient Zeroth-Order Adaptive SAM
Yehonathan Refael
Iftach Arbel
Ofir Lindenbaum
Tom Tirer
212
2
0
26 Feb 2025
SASSHA: Sharpness-aware Adaptive Second-order Optimization with Stable Hessian Approximation
SASSHA: Sharpness-aware Adaptive Second-order Optimization with Stable Hessian Approximation
Dahun Shin
Dongyeop Lee
Jinseok Chung
Namhoon Lee
ODLAAML
666
0
0
25 Feb 2025
Reasoning Bias of Next Token Prediction Training
Reasoning Bias of Next Token Prediction Training
Pengxiao Lin
Zhongwang Zhang
Zhi-Qin John Xu
LRM
240
2
0
21 Feb 2025
On Memorization in Diffusion Models
On Memorization in Diffusion Models
Xiangming Gu
Chao Du
Tianyu Pang
Chongxuan Li
Min Lin
Ye Wang
DiffMTDI
402
62
0
21 Feb 2025
Generalization Guarantees for Representation Learning via Data-Dependent Gaussian Mixture Priors
Generalization Guarantees for Representation Learning via Data-Dependent Gaussian Mixture Priors
Romain Chor
Milad Sefidgaran
Piotr Krasnowski
350
2
0
21 Feb 2025
Unveiling Mode Connectivity in Graph Neural Networks
Unveiling Mode Connectivity in Graph Neural Networks
Bingheng Li
Z. Chen
Haoyu Han
Shenglai Zeng
J. Liu
Jiliang Tang
112
1
0
18 Feb 2025
UniGuardian: A Unified Defense for Detecting Prompt Injection, Backdoor Attacks and Adversarial Attacks in Large Language Models
UniGuardian: A Unified Defense for Detecting Prompt Injection, Backdoor Attacks and Adversarial Attacks in Large Language Models
Huawei Lin
Yingjie Lao
Tong Geng
Tan Yu
Weijie Zhao
AAMLSILM
218
4
0
18 Feb 2025
Improving the Stability of GNN Force Field Models by Reducing Feature Correlation
Improving the Stability of GNN Force Field Models by Reducing Feature Correlation
Y. Zeng
Wenlong He
Ihor Vasyltsov
Jiaxin Wei
Ying Zhang
Lin Chen
Yuehua Dai
106
0
0
18 Feb 2025
Computational Safety for Generative AI: A Signal Processing Perspective
Computational Safety for Generative AI: A Signal Processing Perspective
Pin-Yu Chen
156
2
0
18 Feb 2025
Universal Sharpness Dynamics in Neural Network Training: Fixed Point Analysis, Edge of Stability, and Route to Chaos
Universal Sharpness Dynamics in Neural Network Training: Fixed Point Analysis, Edge of Stability, and Route to Chaos
Dayal Singh Kalra
Tianyu He
M. Barkeshli
255
8
0
17 Feb 2025
3DMolFormer: A Dual-channel Framework for Structure-based Drug Discovery
3DMolFormer: A Dual-channel Framework for Structure-based Drug Discovery
Xiuyuan Hu
Guoqing Liu
Can Chen
Yang Zhao
Jun Wang
Xue Liu
168
3
0
07 Feb 2025
Evidence on the Regularisation Properties of Maximum-Entropy Reinforcement Learning
Evidence on the Regularisation Properties of Maximum-Entropy Reinforcement Learning
Rémy Hosseinkhan Boucher
Onofrio Semeraro
L. Mathelin
162
0
0
28 Jan 2025
Evolutionary Optimization of Model Merging Recipes
Evolutionary Optimization of Model Merging Recipes
Takuya Akiba
Makoto Shing
Yujin Tang
Qi Sun
David Ha
MoMe
356
144
0
28 Jan 2025
Previous
12345...303132
Next