Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1609.04836
Cited By
v1
v2 (latest)
On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima
15 September 2016
N. Keskar
Dheevatsa Mudigere
J. Nocedal
M. Smelyanskiy
P. T. P. Tang
ODL
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima"
50 / 1,554 papers shown
Title
Same Pre-training Loss, Better Downstream: Implicit Bias Matters for Language Models
Hong Liu
Sang Michael Xie
Zhiyuan Li
Tengyu Ma
AI4CE
126
55
0
25 Oct 2022
Deep Neural Networks as the Semi-classical Limit of Topological Quantum Neural Networks: The problem of generalisation
A. Marcianò
De-Wei Chen
Filippo Fabrocini
C. Fields
M. Lulli
Emanuele Zappala
GNN
29
5
0
25 Oct 2022
Sufficient Invariant Learning for Distribution Shift
Taero Kim
Sungjun Lim
Kyungwoo Song
OOD
66
2
0
24 Oct 2022
K-SAM: Sharpness-Aware Minimization at the Speed of SGD
Renkun Ni
Ping Yeh-Chiang
Jonas Geiping
Micah Goldblum
A. Wilson
Tom Goldstein
64
9
0
23 Oct 2022
A New Perspective for Understanding Generalization Gap of Deep Neural Networks Trained with Large Batch Sizes
O. Oyedotun
Konstantinos Papadopoulos
Djamila Aouada
AI4CE
73
12
0
21 Oct 2022
Large-batch Optimization for Dense Visual Predictions
Zeyue Xue
Jianming Liang
Guanglu Song
Zhuofan Zong
Liang Chen
Yu Liu
Ping Luo
VLM
96
9
0
20 Oct 2022
Motion correction in MRI using deep learning and a novel hybrid loss function
Lei Zhang
Xiaoke Wang
Michael Rawson
R. Balan
E. Herskovits
E. Melhem
Linda Chang
Ze Wang
T. Ernst
MedIm
84
13
0
19 Oct 2022
Rethinking Sharpness-Aware Minimization as Variational Inference
Szilvia Ujváry
Zsigmond Telek
A. Kerekes
Anna Mészáros
Ferenc Huszár
63
8
0
19 Oct 2022
Vision Transformers provably learn spatial structure
Samy Jelassi
Michael E. Sander
Yuan-Fang Li
ViT
MLT
100
83
0
13 Oct 2022
SQuAT: Sharpness- and Quantization-Aware Training for BERT
Zheng Wang
Juncheng Billy Li
Shuhui Qu
Florian Metze
Emma Strubell
MQ
42
7
0
13 Oct 2022
GA-SAM: Gradient-Strength based Adaptive Sharpness-Aware Minimization for Improved Generalization
Zhiyuan Zhang
Ruixuan Luo
Qi Su
Xueting Sun
105
13
0
13 Oct 2022
Wasserstein Barycenter-based Model Fusion and Linear Mode Connectivity of Neural Networks
A. K. Akash
Sixu Li
Nicolas García Trillos
71
13
0
13 Oct 2022
Compute-Efficient Deep Learning: Algorithmic Trends and Opportunities
Brian Bartoldson
B. Kailkhura
Davis W. Blalock
107
51
0
13 Oct 2022
On the Effectiveness of Lipschitz-Driven Rehearsal in Continual Learning
Lorenzo Bonicelli
Matteo Boschini
Angelo Porrello
C. Spampinato
Simone Calderara
CLL
72
48
0
12 Oct 2022
Improving Sharpness-Aware Minimization with Fisher Mask for Better Generalization on Language Models
Qihuang Zhong
Liang Ding
Li Shen
Peng Mi
Juhua Liu
Bo Du
Dacheng Tao
AAML
90
51
0
11 Oct 2022
Stable and Efficient Adversarial Training through Local Linearization
Zhuorong Li
Daiwei Yu
AAML
32
0
0
11 Oct 2022
SGD with Large Step Sizes Learns Sparse Features
Maksym Andriushchenko
Aditya Varre
Loucas Pillaud-Vivien
Nicolas Flammarion
136
60
0
11 Oct 2022
Make Sharpness-Aware Minimization Stronger: A Sparsified Perturbation Approach
Peng Mi
Li Shen
Tianhe Ren
Yiyi Zhou
Xiaoshuai Sun
Rongrong Ji
Dacheng Tao
AAML
116
71
0
11 Oct 2022
TAN Without a Burn: Scaling Laws of DP-SGD
Tom Sander
Pierre Stock
Alexandre Sablayrolles
FedML
86
43
0
07 Oct 2022
Invariant Aggregator for Defending against Federated Backdoor Attacks
Xiaoya Wang
Dimitrios Dimitriadis
Oluwasanmi Koyejo
Shruti Tople
FedML
89
1
0
04 Oct 2022
Goal Misgeneralization: Why Correct Specifications Aren't Enough For Correct Goals
Rohin Shah
Vikrant Varma
Ramana Kumar
Mary Phuong
Victoria Krakovna
J. Uesato
Zachary Kenton
92
72
0
04 Oct 2022
MEDFAIR: Benchmarking Fairness for Medical Imaging
Yongshuo Zong
Yongxin Yang
Timothy M. Hospedales
OOD
173
65
0
04 Oct 2022
TripleE: Easy Domain Generalization via Episodic Replay
Xuelong Li
Hongyu Ren
Huifeng Yao
Ziwei Liu
26
0
0
04 Oct 2022
The Dynamics of Sharpness-Aware Minimization: Bouncing Across Ravines and Drifting Towards Wide Minima
Peter L. Bartlett
Philip M. Long
Olivier Bousquet
162
37
0
04 Oct 2022
Shockwave: Fair and Efficient Cluster Scheduling for Dynamic Adaptation in Machine Learning
Pengfei Zheng
Rui Pan
Tarannum Khan
Shivaram Venkataraman
Aditya Akella
88
22
0
30 Sep 2022
Self-Stabilization: The Implicit Bias of Gradient Descent at the Edge of Stability
Alexandru Damian
Eshaan Nichani
Jason D. Lee
107
88
0
30 Sep 2022
Scale-invariant Bayesian Neural Networks with Connectivity Tangent Kernel
Sungyub Kim
Si-hun Park
Kyungsu Kim
Eunho Yang
BDL
79
5
0
30 Sep 2022
Learning Gradient-based Mixup towards Flatter Minima for Domain Generalization
Danni Peng
Sinno Jialin Pan
64
3
0
29 Sep 2022
Label driven Knowledge Distillation for Federated Learning with non-IID Data
Minh-Duong Nguyen
Quoc-Viet Pham
D. Hoang
Long Tran-Thanh
Diep N. Nguyen
Won Joo Hwang
69
2
0
29 Sep 2022
Exploring the Relationship between Architecture and Adversarially Robust Generalization
Aishan Liu
Shiyu Tang
Siyuan Liang
Ruihao Gong
Boxi Wu
Xianglong Liu
Dacheng Tao
AAML
93
19
0
28 Sep 2022
A micromechanics-based recurrent neural networks model for path-dependent cyclic deformation of short fiber composites
J. Friemann
B. Dashtbozorg
Mikael Fagerström
S. Mirkhalaf
AI4CE
73
19
0
27 Sep 2022
Why neural networks find simple solutions: the many regularizers of geometric complexity
Benoit Dherin
Michael Munn
M. Rosca
David Barrett
133
31
0
27 Sep 2022
Two-Tailed Averaging: Anytime, Adaptive, Once-in-a-While Optimal Weight Averaging for Better Generalization
Gábor Melis
MoMe
93
1
0
26 Sep 2022
A Closer Look at Learned Optimization: Stability, Robustness, and Inductive Biases
James Harrison
Luke Metz
Jascha Narain Sohl-Dickstein
112
21
0
22 Sep 2022
Deep Double Descent via Smooth Interpolation
Matteo Gamba
Erik Englesson
Mårten Björkman
Hossein Azizpour
169
11
0
21 Sep 2022
Learning Symbolic Model-Agnostic Loss Functions via Meta-Learning
Christian Raymond
Qi Chen
Bing Xue
Mengjie Zhang
FedML
83
13
0
19 Sep 2022
Is Stochastic Gradient Descent Near Optimal?
Yifan Zhu
Hong Jun Jeon
Benjamin Van Roy
69
2
0
18 Sep 2022
Towards Bridging the Performance Gaps of Joint Energy-based Models
Xiulong Yang
Qing Su
Shihao Ji
VLM
65
15
0
16 Sep 2022
Losing momentum in continuous-time stochastic optimisation
Kexin Jin
J. Latz
Chenguang Liu
Alessandro Scagliotti
52
2
0
08 Sep 2022
Information Maximization for Extreme Pose Face Recognition
Mohammad Saeed Ebrahimi Saadabadi
Sahar Rahimi Malakshan
Sobhan Soleymani
Moktari Mostofa
Nasser M. Nasrabadi
CVBM
59
5
0
07 Sep 2022
Generalisation under gradient descent via deterministic PAC-Bayes
Eugenio Clerico
Tyler Farghly
George Deligiannidis
Benjamin Guedj
Arnaud Doucet
152
4
0
06 Sep 2022
Investigating the Impact of Model Misspecification in Neural Simulation-based Inference
Patrick W Cannon
Daniel Ward
Sebastian M. Schmon
78
36
0
05 Sep 2022
Super-model ecosystem: A domain-adaptation perspective
Fengxiang He
Dacheng Tao
DiffM
84
1
0
30 Aug 2022
Visualizing high-dimensional loss landscapes with Hessian directions
Lucas Böttcher
Gregory R. Wheeler
79
14
0
28 Aug 2022
On the Implicit Bias in Deep-Learning Algorithms
Gal Vardi
FedML
AI4CE
91
81
0
26 Aug 2022
FS-BAN: Born-Again Networks for Domain Generalization Few-Shot Classification
Yunqing Zhao
Ngai-Man Cheung
BDL
65
13
0
23 Aug 2022
A Unified Analysis of Mixed Sample Data Augmentation: A Loss Function Perspective
Chanwoo Park
Sangdoo Yun
Sanghyuk Chun
AAML
83
32
0
21 Aug 2022
Quantifying the Knowledge in a DNN to Explain Knowledge Distillation for Classification
Quanshi Zhang
Xu Cheng
Yilan Chen
Zhefan Rao
56
36
0
18 Aug 2022
Object Detection for Autonomous Dozers
Chunfang Liu
Burhaneddin Yaman
68
2
0
17 Aug 2022
On the generalization of learning algorithms that do not converge
N. Chandramoorthy
Andreas Loukas
Khashayar Gatmiry
Stefanie Jegelka
MLT
91
11
0
16 Aug 2022
Previous
1
2
3
...
11
12
13
...
30
31
32
Next