Papers
Communities
Organizations
Events
Blog
Pricing
Feedback
Contact Sales
Search
Open menu
Home
Papers
1609.04836
Cited By
v1
v2 (latest)
On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima
15 September 2016
N. Keskar
Dheevatsa Mudigere
J. Nocedal
M. Smelyanskiy
P. T. P. Tang
ODL
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima"
50 / 1,601 papers shown
Title
Communication-Efficient Device Scheduling for Federated Learning Using Lyapunov Optimization
Jake B. Perazzone
Maroun Touma
Mingyue Ji
Kevin S. Chan
FedML
187
0
0
01 Mar 2025
LORENZA: Enhancing Generalization in Low-Rank Gradient LLM Training via Efficient Zeroth-Order Adaptive SAM
Yehonathan Refael
Iftach Arbel
Ofir Lindenbaum
Tom Tirer
216
2
0
26 Feb 2025
SASSHA: Sharpness-aware Adaptive Second-order Optimization with Stable Hessian Approximation
Dahun Shin
Dongyeop Lee
Jinseok Chung
Namhoon Lee
ODL
AAML
670
0
0
25 Feb 2025
On Memorization in Diffusion Models
Xiangming Gu
Chao Du
Tianyu Pang
Chongxuan Li
Min Lin
Ye Wang
DiffM
TDI
402
64
0
21 Feb 2025
Generalization Guarantees for Representation Learning via Data-Dependent Gaussian Mixture Priors
Romain Chor
Milad Sefidgaran
Piotr Krasnowski
354
2
0
21 Feb 2025
Reasoning Bias of Next Token Prediction Training
Pengxiao Lin
Zhongwang Zhang
Zhi-Qin John Xu
LRM
240
2
0
21 Feb 2025
Unveiling Mode Connectivity in Graph Neural Networks
Bingheng Li
Z. Chen
Haoyu Han
Shenglai Zeng
J. Liu
Jiliang Tang
112
1
0
18 Feb 2025
Computational Safety for Generative AI: A Signal Processing Perspective
Pin-Yu Chen
156
2
0
18 Feb 2025
Improving the Stability of GNN Force Field Models by Reducing Feature Correlation
Y. Zeng
Wenlong He
Ihor Vasyltsov
Jiaxin Wei
Ying Zhang
Lin Chen
Yuehua Dai
106
0
0
18 Feb 2025
UniGuardian: A Unified Defense for Detecting Prompt Injection, Backdoor Attacks and Adversarial Attacks in Large Language Models
Huawei Lin
Yingjie Lao
Tong Geng
Tan Yu
Weijie Zhao
AAML
SILM
230
4
0
18 Feb 2025
Universal Sharpness Dynamics in Neural Network Training: Fixed Point Analysis, Edge of Stability, and Route to Chaos
Dayal Singh Kalra
Tianyu He
M. Barkeshli
259
8
0
17 Feb 2025
3DMolFormer: A Dual-channel Framework for Structure-based Drug Discovery
Xiuyuan Hu
Guoqing Liu
Can Chen
Yang Zhao
Jun Wang
Xue Liu
176
3
0
07 Feb 2025
Evidence on the Regularisation Properties of Maximum-Entropy Reinforcement Learning
Rémy Hosseinkhan Boucher
Onofrio Semeraro
L. Mathelin
170
0
0
28 Jan 2025
Evolutionary Optimization of Model Merging Recipes
Takuya Akiba
Makoto Shing
Yujin Tang
Qi Sun
David Ha
MoMe
372
144
0
28 Jan 2025
On the use of neural networks for the structural characterization of polymeric porous materials
Jorge Torre
Suset Barroso-Solares
M.A. Rodríguez-Pérez
Javier Pinto
138
7
0
25 Jan 2025
Explicit Eigenvalue Regularization Improves Sharpness-Aware Minimization
Haocheng Luo
Tuan Truong
Tung Pham
Mehrtash Harandi
Dinh Q. Phung
Trung Le
128
5
0
22 Jan 2025
Neural networks for insurance pricing with frequency and severity data: a benchmark study from data preprocessing to technical tariff
Freek Holvoet
Katrien Antonio
Roel Henckaerts
240
3
0
20 Jan 2025
Gradient Descent Converges Linearly to Flatter Minima than Gradient Flow in Shallow Linear Networks
Pierfrancesco Beneventano
Blake Woodworth
MLT
146
2
0
15 Jan 2025
Preconditioned Sharpness-Aware Minimization: Unifying Analysis and a Novel Learning Algorithm
Yilang Zhang
Bingcong Li
G. Giannakis
AAML
103
0
0
11 Jan 2025
Time Transfer: On Optimal Learning Rate and Batch Size In The Infinite Data Limit
Oleg Filatov
Jan Ebert
Jiangtao Wang
Stefan Kesselheim
156
6
0
10 Jan 2025
Towards Unraveling and Improving Generalization in World Models
Qiaoyi Fang
Weiyu Du
Hang Wang
Junshan Zhang
OOD
107
0
0
03 Jan 2025
Adaptive Batch Size Schedules for Distributed Training of Language Models with Data and Model Parallelism
Tim Tsz-Kit Lau
Weijian Li
Chenwei Xu
Han Liu
Mladen Kolar
638
0
0
30 Dec 2024
Can Stability be Detrimental? Better Generalization through Gradient Descent Instabilities
Lawrence Wang
Stephen J. Roberts
132
0
0
23 Dec 2024
Sharpness-Aware Minimization with Adaptive Regularization for Training Deep Neural Networks
Jinping Zou
Xiaoge Deng
Tao Sun
164
1
0
22 Dec 2024
SSE-SAM: Balancing Head and Tail Classes Gradually through Stage-Wise SAM
Xingyu Lyu
Qianqian Xu
Zhiyong Yang
Shaojie Lyu
Qingming Huang
257
1
0
18 Dec 2024
Optical aberrations in autonomous driving: Physics-informed parameterized temperature scaling for neural network uncertainty calibration
D. Wolf
Alexander Braun
Markus Ulrich
317
0
0
18 Dec 2024
LLaVA-UHD v2: an MLLM Integrating High-Resolution Semantic Pyramid via Hierarchical Window Transformer
Yipeng Zhang
Yi Liu
Zonghao Guo
Yidan Zhang
Xuesong Yang
...
Yuan Yao
Zhiyuan Liu
Tat-Seng Chua
Maosong Sun
Maosong Sun
MLLM
VLM
215
0
0
18 Dec 2024
Seeking Consistent Flat Minima for Better Domain Generalization via Refining Loss Landscapes
Aodi Li
Liansheng Zhuang
Xiao Long
Minghong Yao
Shafei Wang
673
2
0
18 Dec 2024
Unveiling the Secret Recipe: A Guide For Supervised Fine-Tuning Small LLMs
Aldo Pareja
Nikhil Shivakumar Nayak
Hao Wang
Krishnateja Killamsetty
Shivchander Sudalairaj
...
Guangxuan Xu
Kai Xu
Ligong Han
Luke Inglis
Akash Srivastava
242
14
0
17 Dec 2024
LossLens: Diagnostics for Machine Learning through Loss Landscape Visual Analytics
Tiankai Xie
Jiaqing Chen
Yaoqing Yang
Caleb Geniesse
Ge Shi
...
J. Cava
Michael W. Mahoney
Talita Perciano
Gunther H. Weber
Ross Maciejewski
143
0
0
17 Dec 2024
A Method for Enhancing Generalization of Adam by Multiple Integrations
Long Jin
Han Nong
Liangming Chen
Zhenming Su
153
0
0
17 Dec 2024
Meta Curvature-Aware Minimization for Domain Generalization
Zhaoyu Chen
Yiwen Ye
Feilong Tang
Yongsheng Pan
Yong-quan Xia
BDL
605
1
0
16 Dec 2024
OccScene: Semantic Occupancy-based Cross-task Mutual Learning for 3D Scene Generation
Bohan Li
Xin Jin
Jiadong Wang
Yukai Shi
Yasheng Sun
...
Zhuang Ma
Baao Xie
Chao Ma
Xiaokang Yang
Wenjun Zeng
DiffM
548
1
0
15 Dec 2024
Towards Understanding the Role of Sharpness-Aware Minimization Algorithms for Out-of-Distribution Generalization
Samuel Schapiro
Han Zhao
166
1
0
06 Dec 2024
Exponential Moving Average of Weights in Deep Learning: Dynamics and Benefits
Daniel Morales-Brotons
Thijs Vogels
Aymeric Dieuleveut
224
38
0
27 Nov 2024
An In-depth Investigation of Sparse Rate Reduction in Transformer-like Models
Yunzhe Hu
Difan Zou
Dong Xu
195
1
0
26 Nov 2024
AI-Spectra: A Visual Dashboard for Model Multiplicity to Enhance Informed and Transparent Decision-Making
Gilles Eerlings
Sebe Vanbrabant
Jori Liesenborgs
Gustavo Rovelo Ruiz
Davy Vanacken
Kris Luyten
107
3
0
14 Nov 2024
Enhancing generalization in high energy physics using white-box adversarial attacks
Franck Rothen
Samuel Klein
Matthew Leigh
J. A. Raine
AAML
117
1
0
14 Nov 2024
LA4SR: illuminating the dark proteome with generative AI
David R. Nelson
Ashish Kumar Jaiswal
Noha Ismail
Alexandra Mystikou
Kourosh Salehi-Ashtiani
89
0
0
11 Nov 2024
Photon: Federated LLM Pre-Training
Lorenzo Sani
Alex Iacob
Zeyu Cao
Royson Lee
Bill Marino
...
Dongqi Cai
Zexi Li
Wanru Zhao
Xinchi Qiu
Nicholas D. Lane
AI4CE
132
9
0
05 Nov 2024
R+R:Understanding Hyperparameter Effects in DP-SGD
Felix Morsbach
J. Reubold
T. Strufe
111
0
0
04 Nov 2024
Theoretical characterisation of the Gauss-Newton conditioning in Neural Networks
Jim Zhao
Sidak Pal Singh
Aurelien Lucchi
AI4CE
259
0
0
04 Nov 2024
1st-Order Magic: Analysis of Sharpness-Aware Minimization
Nalin Tiwary
Siddarth Aananth
90
0
0
03 Nov 2024
Simplicity Bias via Global Convergence of Sharpness Minimization
Khashayar Gatmiry
Zhiyuan Li
Sashank J. Reddi
Stefanie Jegelka
95
1
0
21 Oct 2024
Implicit Regularization of Sharpness-Aware Minimization for Scale-Invariant Problems
Bingcong Li
Liang Zhang
Niao He
153
10
0
18 Oct 2024
MomentumSMoE: Integrating Momentum into Sparse Mixture of Experts
R. Teo
Tan M. Nguyen
MoE
139
3
0
18 Oct 2024
Transformer-Based Approaches for Sensor-Based Human Activity Recognition: Opportunities and Challenges
Clayton Frederick Souza Leite
Henry Mauranen
Aziza Zhanabatyrova
Yu Xiao
86
4
0
17 Oct 2024
From promise to practice: realizing high-performance decentralized training
Zesen Wang
Jiaojiao Zhang
Xuyang Wu
M. Johansson
150
0
0
15 Oct 2024
Combinatorial Multi-armed Bandits: Arm Selection via Group Testing
Arpan Mukherjee
Shashanka Ubaru
K. Murugesan
Karthikeyan Shanmugam
A. Tajer
110
0
0
14 Oct 2024
MoTE: Reconciling Generalization with Specialization for Visual-Language to Video Knowledge Transfer
Minghao Zhu
Zhengpu Wang
Mengxian Hu
Ronghao Dang
Xiao Lin
Xun Zhou
Chengju Liu
Qijun Chen
106
3
0
14 Oct 2024
Previous
1
2
3
4
5
6
...
31
32
33
Next