ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1609.04836
  4. Cited By
On Large-Batch Training for Deep Learning: Generalization Gap and Sharp
  Minima
v1v2 (latest)

On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima

15 September 2016
N. Keskar
Dheevatsa Mudigere
J. Nocedal
M. Smelyanskiy
P. T. P. Tang
    ODL
ArXiv (abs)PDFHTML

Papers citing "On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima"

50 / 1,554 papers shown
Title
Gradient Extrapolation for Debiased Representation Learning
Gradient Extrapolation for Debiased Representation Learning
Ihab Asaad
M. Shadaydeh
Joachim Denzler
83
0
0
17 Mar 2025
High-entropy Advantage in Neural Networks' Generalizability
High-entropy Advantage in Neural Networks' Generalizability
Entao Yang
Wei Wei
Yue Shang
Ge Zhang
AI4CE
119
0
0
17 Mar 2025
Layer-wise Update Aggregation with Recycling for Communication-Efficient Federated Learning
Jisoo Kim
Sungmin Kang
Sunwoo Lee
FedML
74
0
0
14 Mar 2025
Stabilizing Quantization-Aware Training by Implicit-Regularization on Hessian Matrix
Junbiao Pang
Tianyang Cai
132
1
0
14 Mar 2025
Analyzing the Role of Permutation Invariance in Linear Mode Connectivity
Keyao Zhan
Puheng Li
Lei Wu
MoMe
111
0
0
13 Mar 2025
SplatPose: Geometry-Aware 6-DoF Pose Estimation from Single RGB Image via 3D Gaussian Splatting
Linqi Yang
Xiongwei Zhao
Qihao Sun
Ke Wang
Ao Chen
Peng Kang
3DGS
136
6
0
07 Mar 2025
Sharpness-Aware Minimization: General Analysis and Improved Rates
Dimitris Oikonomou
Nicolas Loizou
89
1
0
04 Mar 2025
Deep Learning is Not So Mysterious or Different
Andrew Gordon Wilson
93
6
0
03 Mar 2025
Communication-Efficient Device Scheduling for Federated Learning Using Lyapunov Optimization
Jake B. Perazzone
Shiqiang Wang
Mingyue Ji
Kevin S. Chan
FedML
125
0
0
01 Mar 2025
LORENZA: Enhancing Generalization in Low-Rank Gradient LLM Training via Efficient Zeroth-Order Adaptive SAM
LORENZA: Enhancing Generalization in Low-Rank Gradient LLM Training via Efficient Zeroth-Order Adaptive SAM
Yehonathan Refael
Iftach Arbel
Ofir Lindenbaum
Tom Tirer
167
1
0
26 Feb 2025
SASSHA: Sharpness-aware Adaptive Second-order Optimization with Stable Hessian Approximation
SASSHA: Sharpness-aware Adaptive Second-order Optimization with Stable Hessian Approximation
Dahun Shin
Dongyeop Lee
Jinseok Chung
Namhoon Lee
ODLAAML
506
0
0
25 Feb 2025
Reasoning Bias of Next Token Prediction Training
Reasoning Bias of Next Token Prediction Training
Pengxiao Lin
Zhongwang Zhang
Zhi-Qin John Xu
LRM
191
2
0
21 Feb 2025
On Memorization in Diffusion Models
On Memorization in Diffusion Models
Xiangming Gu
Chao Du
Tianyu Pang
Chongxuan Li
Min Lin
Ye Wang
DiffMTDI
340
55
0
21 Feb 2025
Generalization Guarantees for Representation Learning via Data-Dependent Gaussian Mixture Priors
Generalization Guarantees for Representation Learning via Data-Dependent Gaussian Mixture Priors
Romain Chor
Milad Sefidgaran
Piotr Krasnowski
282
2
0
21 Feb 2025
Computational Safety for Generative AI: A Signal Processing Perspective
Computational Safety for Generative AI: A Signal Processing Perspective
Pin-Yu Chen
124
1
0
18 Feb 2025
Unveiling Mode Connectivity in Graph Neural Networks
Unveiling Mode Connectivity in Graph Neural Networks
Bingheng Li
Z. Chen
Haoyu Han
Shenglai Zeng
J. Liu
Jiliang Tang
86
1
0
18 Feb 2025
Improving the Stability of GNN Force Field Models by Reducing Feature Correlation
Improving the Stability of GNN Force Field Models by Reducing Feature Correlation
Y. Zeng
Wenlong He
Ihor Vasyltsov
Jiaxin Wei
Ying Zhang
Lin Chen
Yuehua Dai
75
0
0
18 Feb 2025
UniGuardian: A Unified Defense for Detecting Prompt Injection, Backdoor Attacks and Adversarial Attacks in Large Language Models
UniGuardian: A Unified Defense for Detecting Prompt Injection, Backdoor Attacks and Adversarial Attacks in Large Language Models
Huawei Lin
Yingjie Lao
Tong Geng
Tan Yu
Weijie Zhao
AAMLSILM
134
3
0
18 Feb 2025
Universal Sharpness Dynamics in Neural Network Training: Fixed Point Analysis, Edge of Stability, and Route to Chaos
Universal Sharpness Dynamics in Neural Network Training: Fixed Point Analysis, Edge of Stability, and Route to Chaos
Dayal Singh Kalra
Tianyu He
M. Barkeshli
123
7
0
17 Feb 2025
3DMolFormer: A Dual-channel Framework for Structure-based Drug Discovery
3DMolFormer: A Dual-channel Framework for Structure-based Drug Discovery
Xiuyuan Hu
Guoqing Liu
Can Chen
Yang Zhao
Jun Wang
Xue Liu
113
2
0
07 Feb 2025
Evidence on the Regularisation Properties of Maximum-Entropy Reinforcement Learning
Evidence on the Regularisation Properties of Maximum-Entropy Reinforcement Learning
Rémy Hosseinkhan Boucher
Onofrio Semeraro
L. Mathelin
121
0
0
28 Jan 2025
Evolutionary Optimization of Model Merging Recipes
Evolutionary Optimization of Model Merging Recipes
Takuya Akiba
Makoto Shing
Yujin Tang
Qi Sun
David Ha
MoMe
297
125
0
28 Jan 2025
On the use of neural networks for the structural characterization of polymeric porous materials
On the use of neural networks for the structural characterization of polymeric porous materials
Jorge Torre
Suset Barroso-Solares
M.A. Rodríguez-Pérez
Javier Pinto
105
5
0
25 Jan 2025
Neural networks for insurance pricing with frequency and severity data: a benchmark study from data preprocessing to technical tariff
Neural networks for insurance pricing with frequency and severity data: a benchmark study from data preprocessing to technical tariff
Freek Holvoet
Katrien Antonio
Roel Henckaerts
160
3
0
20 Jan 2025
Gradient Descent Converges Linearly to Flatter Minima than Gradient Flow in Shallow Linear Networks
Gradient Descent Converges Linearly to Flatter Minima than Gradient Flow in Shallow Linear Networks
Pierfrancesco Beneventano
Blake Woodworth
MLT
94
1
0
15 Jan 2025
Preconditioned Sharpness-Aware Minimization: Unifying Analysis and a Novel Learning Algorithm
Preconditioned Sharpness-Aware Minimization: Unifying Analysis and a Novel Learning Algorithm
Yilang Zhang
Bingcong Li
G. Giannakis
AAML
53
0
0
11 Jan 2025
Time Transfer: On Optimal Learning Rate and Batch Size In The Infinite Data Limit
Time Transfer: On Optimal Learning Rate and Batch Size In The Infinite Data Limit
Oleg Filatov
Jan Ebert
Jiangtao Wang
Stefan Kesselheim
115
4
0
10 Jan 2025
Towards Unraveling and Improving Generalization in World Models
Qiaoyi Fang
Weiyu Du
Hang Wang
Junshan Zhang
OOD
63
0
0
03 Jan 2025
Adaptive Batch Size Schedules for Distributed Training of Language Models with Data and Model Parallelism
Adaptive Batch Size Schedules for Distributed Training of Language Models with Data and Model Parallelism
Tim Tsz-Kit Lau
Weijian Li
Chenwei Xu
Han Liu
Mladen Kolar
464
0
0
30 Dec 2024
Can Stability be Detrimental? Better Generalization through Gradient
  Descent Instabilities
Can Stability be Detrimental? Better Generalization through Gradient Descent Instabilities
Lawrence Wang
Stephen J. Roberts
91
0
0
23 Dec 2024
Sharpness-Aware Minimization with Adaptive Regularization for Training
  Deep Neural Networks
Sharpness-Aware Minimization with Adaptive Regularization for Training Deep Neural Networks
Jinping Zou
Xiaoge Deng
Tao Sun
106
1
0
22 Dec 2024
SSE-SAM: Balancing Head and Tail Classes Gradually through Stage-Wise
  SAM
SSE-SAM: Balancing Head and Tail Classes Gradually through Stage-Wise SAM
Xingyu Lyu
Qianqian Xu
Zhiyong Yang
Shaojie Lyu
Qingming Huang
198
1
0
18 Dec 2024
Optical aberrations in autonomous driving: Physics-informed parameterized temperature scaling for neural network uncertainty calibration
Optical aberrations in autonomous driving: Physics-informed parameterized temperature scaling for neural network uncertainty calibration
D. Wolf
Alexander Braun
Markus Ulrich
266
0
0
18 Dec 2024
LLaVA-UHD v2: an MLLM Integrating High-Resolution Semantic Pyramid via Hierarchical Window Transformer
LLaVA-UHD v2: an MLLM Integrating High-Resolution Semantic Pyramid via Hierarchical Window Transformer
Yipeng Zhang
Yi Liu
Zonghao Guo
Yidan Zhang
Xuesong Yang
...
Yuan Yao
Zhiyuan Liu
Tat-Seng Chua
Maosong Sun
Maosong Sun
MLLMVLM
145
0
0
18 Dec 2024
Seeking Consistent Flat Minima for Better Domain Generalization via Refining Loss Landscapes
Seeking Consistent Flat Minima for Better Domain Generalization via Refining Loss Landscapes
Aodi Li
Liansheng Zhuang
Xiao Long
Minghong Yao
Shafei Wang
499
1
0
18 Dec 2024
Unveiling the Secret Recipe: A Guide For Supervised Fine-Tuning Small
  LLMs
Unveiling the Secret Recipe: A Guide For Supervised Fine-Tuning Small LLMs
Aldo Pareja
Nikhil Shivakumar Nayak
Hao Wang
Krishnateja Killamsetty
Shivchander Sudalairaj
...
Guangxuan Xu
Kai Xu
Ligong Han
Luke Inglis
Akash Srivastava
197
7
0
17 Dec 2024
LossLens: Diagnostics for Machine Learning through Loss Landscape Visual
  Analytics
LossLens: Diagnostics for Machine Learning through Loss Landscape Visual Analytics
Tiankai Xie
Jiaqing Chen
Yaoqing Yang
Caleb Geniesse
Ge Shi
...
J. Cava
Michael W. Mahoney
Talita Perciano
Gunther H. Weber
Ross Maciejewski
102
0
0
17 Dec 2024
A Method for Enhancing Generalization of Adam by Multiple Integrations
A Method for Enhancing Generalization of Adam by Multiple Integrations
Long Jin
Han Nong
Liangming Chen
Zhenming Su
101
0
0
17 Dec 2024
Meta Curvature-Aware Minimization for Domain Generalization
Meta Curvature-Aware Minimization for Domain Generalization
Zhaoyu Chen
Yiwen Ye
Feilong Tang
Yongsheng Pan
Yong-quan Xia
BDL
433
1
0
16 Dec 2024
OccScene: Semantic Occupancy-based Cross-task Mutual Learning for 3D
  Scene Generation
OccScene: Semantic Occupancy-based Cross-task Mutual Learning for 3D Scene Generation
Bohan Li
Xin Jin
Jiadong Wang
Yukai Shi
Yasheng Sun
...
Zhuang Ma
Baao Xie
Chao Ma
Xiaokang Yang
Wenjun Zeng
DiffM
420
1
0
15 Dec 2024
Towards Understanding the Role of Sharpness-Aware Minimization
  Algorithms for Out-of-Distribution Generalization
Towards Understanding the Role of Sharpness-Aware Minimization Algorithms for Out-of-Distribution Generalization
Samuel Schapiro
Han Zhao
137
1
0
06 Dec 2024
Exponential Moving Average of Weights in Deep Learning: Dynamics and
  Benefits
Exponential Moving Average of Weights in Deep Learning: Dynamics and Benefits
Daniel Morales-Brotons
Thijs Vogels
Hadrien Hendrikx
188
23
0
27 Nov 2024
An In-depth Investigation of Sparse Rate Reduction in Transformer-like
  Models
An In-depth Investigation of Sparse Rate Reduction in Transformer-like Models
Yunzhe Hu
Difan Zou
Dong Xu
123
1
0
26 Nov 2024
AI-Spectra: A Visual Dashboard for Model Multiplicity to Enhance
  Informed and Transparent Decision-Making
AI-Spectra: A Visual Dashboard for Model Multiplicity to Enhance Informed and Transparent Decision-Making
Gilles Eerlings
Sebe Vanbrabant
Jori Liesenborgs
Gustavo Rovelo Ruiz
Davy Vanacken
Kris Luyten
76
2
0
14 Nov 2024
Enhancing generalization in high energy physics using white-box
  adversarial attacks
Enhancing generalization in high energy physics using white-box adversarial attacks
Franck Rothen
Samuel Klein
Matthew Leigh
T. Golling
AAML
55
1
0
14 Nov 2024
LA4SR: illuminating the dark proteome with generative AI
LA4SR: illuminating the dark proteome with generative AI
David R. Nelson
Ashish Kumar Jaiswal
Noha Ismail
Alexandra Mystikou
Kourosh Salehi-Ashtiani
40
0
0
11 Nov 2024
Photon: Federated LLM Pre-Training
Photon: Federated LLM Pre-Training
Lorenzo Sani
Alex Iacob
Zeyu Cao
Royson Lee
Bill Marino
...
Dongqi Cai
Zexi Li
Wanru Zhao
Xinchi Qiu
Nicholas D. Lane
AI4CE
83
9
0
05 Nov 2024
R+R:Understanding Hyperparameter Effects in DP-SGD
R+R:Understanding Hyperparameter Effects in DP-SGD
Felix Morsbach
J. Reubold
T. Strufe
77
0
0
04 Nov 2024
Theoretical characterisation of the Gauss-Newton conditioning in Neural Networks
Theoretical characterisation of the Gauss-Newton conditioning in Neural Networks
Jim Zhao
Sidak Pal Singh
Aurelien Lucchi
AI4CE
144
0
0
04 Nov 2024
1st-Order Magic: Analysis of Sharpness-Aware Minimization
1st-Order Magic: Analysis of Sharpness-Aware Minimization
Nalin Tiwary
Siddarth Aananth
55
0
0
03 Nov 2024
Previous
12345...303132
Next