ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1609.04836
  4. Cited By
On Large-Batch Training for Deep Learning: Generalization Gap and Sharp
  Minima
v1v2 (latest)

On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima

15 September 2016
N. Keskar
Dheevatsa Mudigere
J. Nocedal
M. Smelyanskiy
P. T. P. Tang
    ODL
ArXiv (abs)PDFHTML

Papers citing "On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima"

50 / 1,653 papers shown
Communication-Efficient Device Scheduling for Federated Learning Using Lyapunov Optimization
Jake B. Perazzone
Maroun Touma
Mingyue Ji
Kevin S. Chan
FedML
381
2
0
01 Mar 2025
LORENZA: Enhancing Generalization in Low-Rank Gradient LLM Training via Efficient Zeroth-Order Adaptive SAM
LORENZA: Enhancing Generalization in Low-Rank Gradient LLM Training via Efficient Zeroth-Order Adaptive SAM
Yehonathan Refael
Iftach Arbel
Ofir Lindenbaum
Tom Tirer
426
2
0
26 Feb 2025
SASSHA: Sharpness-aware Adaptive Second-order Optimization with Stable Hessian Approximation
SASSHA: Sharpness-aware Adaptive Second-order Optimization with Stable Hessian Approximation
Dahun Shin
Dongyeop Lee
Jinseok Chung
Namhoon Lee
ODLAAML
1.3K
2
0
25 Feb 2025
Reasoning Bias of Next Token Prediction Training
Reasoning Bias of Next Token Prediction Training
Pengxiao Lin
Zhongwang Zhang
Zhi-Qin John Xu
LRM
476
2
0
21 Feb 2025
On Memorization in Diffusion Models
On Memorization in Diffusion Models
Xiangming Gu
Chao Du
Tianyu Pang
Chongxuan Li
Min Lin
Ye Wang
DiffMTDI
618
93
0
21 Feb 2025
Generalization Guarantees for Representation Learning via Data-Dependent Gaussian Mixture Priors
Generalization Guarantees for Representation Learning via Data-Dependent Gaussian Mixture PriorsInternational Conference on Learning Representations (ICLR), 2025
Romain Chor
Milad Sefidgaran
Piotr Krasnowski
488
3
0
21 Feb 2025
Unveiling Mode Connectivity in Graph Neural Networks
Unveiling Mode Connectivity in Graph Neural Networks
Bingheng Li
Z. Chen
Haoyu Han
Shenglai Zeng
J. Liu
Shucheng Zhou
258
1
0
18 Feb 2025
Improving the Stability of GNN Force Field Models by Reducing Feature Correlation
Improving the Stability of GNN Force Field Models by Reducing Feature Correlation
Y. Zeng
Wenlong He
Ihor Vasyltsov
Jiaxin Wei
Ying Zhang
Lin Chen
Yuehua Dai
199
0
0
18 Feb 2025
Computational Safety for Generative AI: A Signal Processing Perspective
Computational Safety for Generative AI: A Signal Processing Perspective
Pin-Yu Chen
310
2
0
18 Feb 2025
UniGuardian: A Unified Defense for Detecting Prompt Injection, Backdoor Attacks and Adversarial Attacks in Large Language Models
UniGuardian: A Unified Defense for Detecting Prompt Injection, Backdoor Attacks and Adversarial Attacks in Large Language Models
Huawei Lin
Yingjie Lao
Tong Geng
Tan Yu
Weijie Zhao
AAMLSILM
467
7
0
18 Feb 2025
Universal Sharpness Dynamics in Neural Network Training: Fixed Point Analysis, Edge of Stability, and Route to Chaos
Universal Sharpness Dynamics in Neural Network Training: Fixed Point Analysis, Edge of Stability, and Route to Chaos
Dayal Singh Kalra
Tianyu He
M. Barkeshli
395
11
0
17 Feb 2025
3DMolFormer: A Dual-channel Framework for Structure-based Drug Discovery
3DMolFormer: A Dual-channel Framework for Structure-based Drug DiscoveryInternational Conference on Learning Representations (ICLR), 2025
Xiuyuan Hu
Guoqing Liu
Can Chen
Yang Zhao
Ning Yang
Xue Liu
301
5
0
07 Feb 2025
Evidence on the Regularisation Properties of Maximum-Entropy Reinforcement Learning
Evidence on the Regularisation Properties of Maximum-Entropy Reinforcement Learning
Rémy Hosseinkhan Boucher
Onofrio Semeraro
L. Mathelin
302
1
0
28 Jan 2025
Evolutionary Optimization of Model Merging Recipes
Evolutionary Optimization of Model Merging Recipes
Takuya Akiba
Makoto Shing
Yujin Tang
Qi Sun
David Ha
MoMe
653
175
0
28 Jan 2025
On the use of neural networks for the structural characterization of polymeric porous materials
On the use of neural networks for the structural characterization of polymeric porous materials
Jorge Torre
Suset Barroso-Solares
M.A. Rodríguez-Pérez
Javier Pinto
247
7
0
25 Jan 2025
Explicit Eigenvalue Regularization Improves Sharpness-Aware Minimization
Explicit Eigenvalue Regularization Improves Sharpness-Aware Minimization
Haocheng Luo
Tuan Truong
Tung Pham
Mehrtash Harandi
Dinh Q. Phung
Trung Le
233
12
0
22 Jan 2025
Neural networks for insurance pricing with frequency and severity data: a benchmark study from data preprocessing to technical tariff
Neural networks for insurance pricing with frequency and severity data: a benchmark study from data preprocessing to technical tariffNorth American Actuarial Journal (NAAJ), 2023
Freek Holvoet
Katrien Antonio
Roel Henckaerts
396
7
0
20 Jan 2025
Gradient Descent Converges Linearly to Flatter Minima than Gradient Flow in Shallow Linear Networks
Gradient Descent Converges Linearly to Flatter Minima than Gradient Flow in Shallow Linear Networks
Pierfrancesco Beneventano
Blake Woodworth
MLT
406
2
0
15 Jan 2025
Preconditioned Sharpness-Aware Minimization: Unifying Analysis and a Novel Learning Algorithm
Preconditioned Sharpness-Aware Minimization: Unifying Analysis and a Novel Learning AlgorithmIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2025
Yilang Zhang
Bingcong Li
G. Giannakis
AAML
214
0
0
11 Jan 2025
Time Transfer: On Optimal Learning Rate and Batch Size In The Infinite Data Limit
Time Transfer: On Optimal Learning Rate and Batch Size In The Infinite Data Limit
Oleg Filatov
Jan Ebert
Jiangtao Wang
Stefan Kesselheim
352
8
0
10 Jan 2025
Towards Unraveling and Improving Generalization in World Models
Qiaoyi Fang
Weiyu Du
Hang Wang
Junshan Zhang
OOD
237
1
0
03 Jan 2025
Adaptive Batch Size Schedules for Distributed Training of Language Models with Data and Model Parallelism
Adaptive Batch Size Schedules for Distributed Training of Language Models with Data and Model Parallelism
Tim Tsz-Kit Lau
Weijian Li
Chenwei Xu
Han Liu
Mladen Kolar
1.2K
0
0
30 Dec 2024
Can Stability be Detrimental? Better Generalization through Gradient
  Descent Instabilities
Can Stability be Detrimental? Better Generalization through Gradient Descent Instabilities
Lawrence Wang
Stephen J. Roberts
272
0
0
23 Dec 2024
Sharpness-Aware Minimization with Adaptive Regularization for Training
  Deep Neural Networks
Sharpness-Aware Minimization with Adaptive Regularization for Training Deep Neural NetworksIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2024
Jinping Zou
Xiaoge Deng
Tao Sun
334
1
0
22 Dec 2024
SSE-SAM: Balancing Head and Tail Classes Gradually through Stage-Wise
  SAM
SSE-SAM: Balancing Head and Tail Classes Gradually through Stage-Wise SAMAAAI Conference on Artificial Intelligence (AAAI), 2024
Xingyu Lyu
Qianqian Xu
Zhiyong Yang
Shaojie Lyu
Qingming Huang
520
1
0
18 Dec 2024
LLaVA-UHD v2: an MLLM Integrating High-Resolution Semantic Pyramid via Hierarchical Window Transformer
LLaVA-UHD v2: an MLLM Integrating High-Resolution Semantic Pyramid via Hierarchical Window Transformer
Yipeng Zhang
Yi Liu
Zonghao Guo
Yidan Zhang
Xuesong Yang
...
Xingtai Lv
Zhiyuan Liu
Tat-Seng Chua
Maosong Sun
Maosong Sun
MLLMVLM
351
3
0
18 Dec 2024
Optical aberrations in autonomous driving: Physics-informed parameterized temperature scaling for neural network uncertainty calibration
Optical aberrations in autonomous driving: Physics-informed parameterized temperature scaling for neural network uncertainty calibration
D. Wolf
Alexander Braun
Markus Ulrich
526
0
0
18 Dec 2024
Seeking Consistent Flat Minima for Better Domain Generalization via Refining Loss Landscapes
Seeking Consistent Flat Minima for Better Domain Generalization via Refining Loss LandscapesComputer Vision and Pattern Recognition (CVPR), 2024
Aodi Li
Liansheng Zhuang
Xiao Long
Minghong Yao
Shafei Wang
1.1K
6
0
18 Dec 2024
Unveiling the Secret Recipe: A Guide For Supervised Fine-Tuning Small
  LLMs
Unveiling the Secret Recipe: A Guide For Supervised Fine-Tuning Small LLMsInternational Conference on Learning Representations (ICLR), 2024
Aldo Pareja
Nikhil Shivakumar Nayak
Hao Wang
Krishnateja Killamsetty
Shivchander Sudalairaj
...
Guangxuan Xu
Kai Xu
Ligong Han
Luke Inglis
Akash Srivastava
438
29
0
17 Dec 2024
LossLens: Diagnostics for Machine Learning through Loss Landscape Visual
  Analytics
LossLens: Diagnostics for Machine Learning through Loss Landscape Visual AnalyticsIEEE Computer Graphics and Applications (IEEE CG&A), 2024
Tiankai Xie
Jiaqing Chen
Yaoqing Yang
Caleb Geniesse
Ge Shi
...
J. Cava
Michael W. Mahoney
Talita Perciano
Gunther H. Weber
Ross Maciejewski
297
1
0
17 Dec 2024
A Method for Enhancing Generalization of Adam by Multiple Integrations
A Method for Enhancing Generalization of Adam by Multiple IntegrationsAAAI Conference on Artificial Intelligence (AAAI), 2024
Long Jin
Han Nong
Liangming Chen
Zhenming Su
261
0
0
17 Dec 2024
Meta Curvature-Aware Minimization for Domain Generalization
Meta Curvature-Aware Minimization for Domain Generalization
Zhaoyu Chen
Yiwen Ye
Feilong Tang
Yongsheng Pan
Yong-quan Xia
BDL
1.0K
1
0
16 Dec 2024
OccScene: Semantic Occupancy-based Cross-task Mutual Learning for 3D Scene Generation
OccScene: Semantic Occupancy-based Cross-task Mutual Learning for 3D Scene GenerationIEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2024
Bohan Li
Jianfeng Dong
Jiadong Wang
Yukai Shi
Yasheng Sun
...
Zhuang Ma
Baao Xie
Chao Ma
Yunbo Wang
Wenjun Zeng
DiffM
874
4
0
15 Dec 2024
Towards Understanding the Role of Sharpness-Aware Minimization
  Algorithms for Out-of-Distribution Generalization
Towards Understanding the Role of Sharpness-Aware Minimization Algorithms for Out-of-Distribution Generalization
Samuel Schapiro
Han Zhao
383
1
0
06 Dec 2024
Exponential Moving Average of Weights in Deep Learning: Dynamics and
  Benefits
Exponential Moving Average of Weights in Deep Learning: Dynamics and Benefits
Daniel Morales-Brotons
Thijs Vogels
Aymeric Dieuleveut
394
65
0
27 Nov 2024
An In-depth Investigation of Sparse Rate Reduction in Transformer-like
  Models
An In-depth Investigation of Sparse Rate Reduction in Transformer-like ModelsNeural Information Processing Systems (NeurIPS), 2024
Yunzhe Hu
Difan Zou
Dong Xu
383
3
0
26 Nov 2024
AI-Spectra: A Visual Dashboard for Model Multiplicity to Enhance
  Informed and Transparent Decision-Making
AI-Spectra: A Visual Dashboard for Model Multiplicity to Enhance Informed and Transparent Decision-MakingEngineering Interactive Computing System (EICS), 2024
Gilles Eerlings
Sebe Vanbrabant
Jori Liesenborgs
Gustavo Rovelo Ruiz
Davy Vanacken
Kris Luyten
243
5
0
14 Nov 2024
Enhancing generalization in high energy physics using white-box adversarial attacks
Enhancing generalization in high energy physics using white-box adversarial attacks
Franck Rothen
Samuel Klein
Matthew Leigh
J. A. Raine
AAML
373
1
0
14 Nov 2024
LA4SR: illuminating the dark proteome with generative AI
LA4SR: illuminating the dark proteome with generative AI
David R. Nelson
Ashish Kumar Jaiswal
Noha Ismail
Alexandra Mystikou
Kourosh Salehi-Ashtiani
169
0
0
11 Nov 2024
Photon: Federated LLM Pre-Training
Photon: Federated LLM Pre-Training
Lorenzo Sani
Alex Iacob
Zeyu Cao
Royson Lee
Bill Marino
...
Dongqi Cai
Zexi Li
Wanru Zhao
Xinchi Qiu
Nicholas D. Lane
AI4CE
316
16
0
05 Nov 2024
R+R:Understanding Hyperparameter Effects in DP-SGD
R+R:Understanding Hyperparameter Effects in DP-SGDAsia-Pacific Computer Systems Architecture Conference (ACSA), 2024
Felix Morsbach
J. Reubold
T. Strufe
236
1
0
04 Nov 2024
Theoretical characterisation of the Gauss-Newton conditioning in Neural Networks
Theoretical characterisation of the Gauss-Newton conditioning in Neural NetworksNeural Information Processing Systems (NeurIPS), 2024
Jim Zhao
Sidak Pal Singh
Aurelien Lucchi
AI4CE
539
4
0
04 Nov 2024
1st-Order Magic: Analysis of Sharpness-Aware Minimization
1st-Order Magic: Analysis of Sharpness-Aware Minimization
Nalin Tiwary
Siddarth Aananth
175
0
0
03 Nov 2024
Simplicity Bias via Global Convergence of Sharpness Minimization
Simplicity Bias via Global Convergence of Sharpness MinimizationInternational Conference on Machine Learning (ICML), 2024
Khashayar Gatmiry
Zhiyuan Li
Sashank J. Reddi
Stefanie Jegelka
264
2
0
21 Oct 2024
Implicit Regularization of Sharpness-Aware Minimization for
  Scale-Invariant Problems
Implicit Regularization of Sharpness-Aware Minimization for Scale-Invariant ProblemsNeural Information Processing Systems (NeurIPS), 2024
Bingcong Li
Liang Zhang
Niao He
284
9
0
18 Oct 2024
MomentumSMoE: Integrating Momentum into Sparse Mixture of Experts
MomentumSMoE: Integrating Momentum into Sparse Mixture of ExpertsNeural Information Processing Systems (NeurIPS), 2024
R. Teo
Tan M. Nguyen
MoE
221
6
0
18 Oct 2024
Transformer-Based Approaches for Sensor-Based Human Activity
  Recognition: Opportunities and Challenges
Transformer-Based Approaches for Sensor-Based Human Activity Recognition: Opportunities and Challenges
Clayton Frederick Souza Leite
Henry Mauranen
Aziza Zhanabatyrova
Yu Xiao
280
8
0
17 Oct 2024
Deep Model Merging: The Sister of Neural Network Interpretability -- A Survey
Deep Model Merging: The Sister of Neural Network Interpretability -- A Survey
A. Khan
Todd Nief
Nathaniel Hudson
Mansi Sakarvadia
Daniel Grzenda
Aswathy Ajith
Jordan Pettyjohn
Kyle Chard
Ian Foster
MoMe
203
1
0
16 Oct 2024
From promise to practice: realizing high-performance decentralized
  training
From promise to practice: realizing high-performance decentralized trainingInternational Conference on Learning Representations (ICLR), 2024
Zesen Wang
Jiaojiao Zhang
Xuyang Wu
M. Johansson
310
2
0
15 Oct 2024
Combinatorial Multi-armed Bandits: Arm Selection via Group Testing
Combinatorial Multi-armed Bandits: Arm Selection via Group Testing
Arpan Mukherjee
Shashanka Ubaru
K. Murugesan
Karthikeyan Shanmugam
A. Tajer
294
5
0
14 Oct 2024
Previous
12345...323334
Next