Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1609.04836
Cited By
v1
v2 (latest)
On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima
15 September 2016
N. Keskar
Dheevatsa Mudigere
J. Nocedal
M. Smelyanskiy
P. T. P. Tang
ODL
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima"
50 / 1,554 papers shown
Title
Catapults in SGD: spikes in the training loss and their impact on generalization through feature learning
Libin Zhu
Chaoyue Liu
Adityanarayanan Radhakrishnan
M. Belkin
124
15
0
07 Jun 2023
Stochastic Collapse: How Gradient Noise Attracts SGD Dynamics Towards Simpler Subnetworks
F. Chen
D. Kunin
Atsushi Yamamura
Surya Ganguli
121
29
0
07 Jun 2023
Normalization Layers Are All That Sharpness-Aware Minimization Needs
Maximilian Mueller
Tiffany J. Vlaar
David Rolnick
Matthias Hein
87
24
0
07 Jun 2023
Optimal Transport Model Distributional Robustness
Van-Anh Nguyen
Trung Le
Anh Tuan Bui
Thanh-Toan Do
Dinh Q. Phung
OOD
77
4
0
07 Jun 2023
Decentralized SGD and Average-direction SAM are Asymptotically Equivalent
Tongtian Zhu
Fengxiang He
Kaixuan Chen
Mingli Song
Dacheng Tao
152
15
0
05 Jun 2023
Enhance Diffusion to Improve Robust Generalization
Jianhui Sun
Sanchit Sinha
Aidong Zhang
69
4
0
05 Jun 2023
ReContrast: Domain-Specific Anomaly Detection via Contrastive Reconstruction
Jia Guo
Shuai Lu
Lize Jia
Weihang Zhang
Huiqi Li
94
31
0
05 Jun 2023
When Decentralized Optimization Meets Federated Learning
Hongchang Gao
My T. Thai
Jie Wu
FedML
96
23
0
05 Jun 2023
A Mathematical Abstraction for Balancing the Trade-off Between Creativity and Reality in Large Language Models
Ritwik Sinha
Zhao Song
Dinesh Manocha
102
25
0
04 Jun 2023
Investigating Navigation Strategies in the Morris Water Maze through Deep Reinforcement Learning
A. Liu
Alla Borisyuk
48
7
0
01 Jun 2023
Understanding Augmentation-based Self-Supervised Representation Learning via RKHS Approximation and Regression
Runtian Zhai
Bing Liu
Andrej Risteski
Zico Kolter
Pradeep Ravikumar
SSL
117
10
0
01 Jun 2023
Combining Explicit and Implicit Regularization for Efficient Learning in Deep Networks
Dan Zhao
104
6
0
01 Jun 2023
Inconsistency, Instability, and Generalization Gap of Deep Neural Network Training
Rie Johnson
Tong Zhang
43
6
0
31 May 2023
Adaptive Self-Distillation for Minimizing Client Drift in Heterogeneous Federated Learning
M.Yashwanth
Gaurav Kumar Nayak
Aryaveer Singh
Yogesh Singh
Anirban Chakraborty
FedML
98
1
0
31 May 2023
SANE: The phases of gradient descent through Sharpness Adjusted Number of Effective parameters
Lawrence Wang
Stephen J. Roberts
116
0
0
29 May 2023
The Implicit Regularization of Dynamical Stability in Stochastic Gradient Descent
Lei Wu
Weijie J. Su
MLT
93
23
0
27 May 2023
Ghost Noise for Regularizing Deep Neural Networks
Atli Kosson
Dongyang Fan
Martin Jaggi
46
1
0
26 May 2023
Batch Model Consolidation: A Multi-Task Model Consolidation Framework
Iordanis Fostiropoulos
Jiaye Zhu
Laurent Itti
MoMe
CLL
54
3
0
25 May 2023
Sharpness-Aware Minimization Leads to Low-Rank Features
Maksym Andriushchenko
Dara Bahri
H. Mobahi
Nicolas Flammarion
AAML
124
25
0
25 May 2023
Implicit bias of SGD in
L
2
L_{2}
L
2
-regularized linear DNNs: One-way jumps from high to low rank
Zihan Wang
Arthur Jacot
83
21
0
25 May 2023
SING: A Plug-and-Play DNN Learning Technique
Adrien Courtois
Damien Scieur
Jean-Michel Morel
Pablo Arias
Thomas Eboli
54
0
0
25 May 2023
Stochastic Modified Equations and Dynamics of Dropout Algorithm
Zhongwang Zhang
Yuqing Li
Yaoyu Zhang
Z. Xu
43
9
0
25 May 2023
Sharpness-Aware Minimization Revisited: Weighted Sharpness as a Regularization Term
Yun Yue
Jiadi Jiang
Zhiling Ye
Ni Gao
Yongchao Liu
Kecheng Zhang
MLAU
ODL
113
14
0
25 May 2023
How to escape sharp minima with random perturbations
Kwangjun Ahn
Ali Jadbabaie
S. Sra
ODL
113
8
0
25 May 2023
The Crucial Role of Normalization in Sharpness-Aware Minimization
Yan Dai
Kwangjun Ahn
S. Sra
120
19
0
24 May 2023
Momentum Provably Improves Error Feedback!
Ilyas Fatkhullin
Alexander Tyurin
Peter Richtárik
116
23
0
24 May 2023
On progressive sharpening, flat minima and generalisation
L. MacDonald
Jack Valmadre
Simon Lucey
80
4
0
24 May 2023
Sharpness-Aware Data Poisoning Attack
Pengfei He
Han Xu
Jie Ren
Yingqian Cui
Hui Liu
Charu C. Aggarwal
Jiliang Tang
AAML
151
8
0
24 May 2023
Transferring Learning Trajectories of Neural Networks
Daiki Chijiwa
57
3
0
23 May 2023
On the Optimal Batch Size for Byzantine-Robust Distributed Learning
Yi-Rui Yang
Chang-Wei Shi
Wu-Jun Li
FedML
AAML
92
0
0
23 May 2023
Improving Convergence and Generalization Using Parameter Symmetries
Bo Zhao
Robert Mansel Gower
Robin Walters
Rose Yu
MoMe
127
16
0
22 May 2023
Gradient Descent Monotonically Decreases the Sharpness of Gradient Flow Solutions in Scalar Networks and Beyond
Itai Kreisler
Mor Shpigel Nacson
Daniel Soudry
Y. Carmon
73
15
0
22 May 2023
Evolutionary Algorithms in the Light of SGD: Limit Equivalence, Minima Flatness, and Transfer Learning
Andrei Kucharavy
R. Guerraoui
Ljiljana Dolamic
104
1
0
20 May 2023
Loss Spike in Training Neural Networks
Zhongwang Zhang
Z. Xu
72
7
0
20 May 2023
Flatness-Aware Prompt Selection Improves Accuracy and Sample Efficiency
Lingfeng Shen
Weiting Tan
Boyuan Zheng
Daniel Khashabi
VLM
127
6
0
18 May 2023
Generalization Bounds for Neural Belief Propagation Decoders
Sudarshan Adiga
Xin Xiao
Ravi Tandon
Bane V. Vasic
Tamal Bose
BDL
AI4CE
64
5
0
17 May 2023
Sharpness & Shift-Aware Self-Supervised Learning
Ngoc N. Tran
S. Duong
Hoang Phan
Tung Pham
Dinh Q. Phung
Trung Le
SSL
71
1
0
17 May 2023
SUG: Single-dataset Unified Generalization for 3D Point Cloud Classification
Siyuan Huang
Bo Zhang
Botian Shi
Penglei Gao
Yikang Li
Hongsheng Li
3DPC
3DGS
72
14
0
16 May 2023
The Hessian perspective into the Nature of Convolutional Neural Networks
Sidak Pal Singh
Thomas Hofmann
Bernhard Schölkopf
96
11
0
16 May 2023
GeNAS: Neural Architecture Search with Better Generalization
Joonhyun Jeong
Joonsang Yu
Geondo Park
Dongyoon Han
Y. Yoo
78
4
0
15 May 2023
Quantization Aware Attack: Enhancing Transferable Adversarial Attacks by Model Quantization
Yulong Yang
Chenhao Lin
Qian Li
Zhengyu Zhao
Haoran Fan
Dawei Zhou
Nannan Wang
Tongliang Liu
Chao Shen
AAML
MQ
130
14
0
10 May 2023
Sharpness-Aware Minimization Alone can Improve Adversarial Robustness
Zeming Wei
Jingyu Zhu
Yihao Zhang
AAML
86
11
0
09 May 2023
Model-agnostic Measure of Generalization Difficulty
Akhilan Boopathy
Kevin Liu
Jaedong Hwang
Shu Ge
Asaad Mohammedsaleh
Ila Fiete
129
4
0
01 May 2023
An Adaptive Policy to Employ Sharpness-Aware Minimization
Weisen Jiang
Hansi Yang
Yu Zhang
James T. Kwok
AAML
128
34
0
28 Apr 2023
Pre-processing training data improves accuracy and generalisability of convolutional neural network based landscape semantic segmentation
A. Clark
S. Phinn
P. Scarth
22
3
0
28 Apr 2023
More Communication Does Not Result in Smaller Generalization Error in Federated Learning
Abdellatif Zaidi
Romain Chor
Milad Sefidgaran
FedML
AI4CE
90
10
0
24 Apr 2023
Hierarchical Weight Averaging for Deep Neural Networks
Xiaozhe Gu
Zixun Zhang
Yuncheng Jiang
Yaoyu Zhang
Ruimao Zhang
Shuguang Cui
Zhuguo Li
52
5
0
23 Apr 2023
Do deep neural networks have an inbuilt Occam's razor?
Chris Mingard
Henry Rees
Guillermo Valle Pérez
A. Louis
UQCV
BDL
62
16
0
13 Apr 2023
On Efficient Training of Large-Scale Deep Learning Models: A Literature Review
Li Shen
Yan Sun
Zhiyuan Yu
Liang Ding
Xinmei Tian
Dacheng Tao
VLM
105
43
0
07 Apr 2023
On the Pareto Front of Multilingual Neural Machine Translation
Liang Chen
Shuming Ma
Dongdong Zhang
Furu Wei
Baobao Chang
MoE
79
5
0
06 Apr 2023
Previous
1
2
3
...
8
9
10
...
30
31
32
Next