Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1609.04836
Cited By
v1
v2 (latest)
On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima
15 September 2016
N. Keskar
Dheevatsa Mudigere
J. Nocedal
M. Smelyanskiy
P. T. P. Tang
ODL
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima"
50 / 1,554 papers shown
Title
A Comprehensive Survey of Continual Learning: Theory, Method and Application
Liyuan Wang
Xingxing Zhang
Hang Su
Jun Zhu
KELM
CLL
219
712
0
31 Jan 2023
Emergence of Maps in the Memories of Blind Navigation Agents
Erik Wijmans
Manolis Savva
Irfan Essa
Stefan Lee
Ari S. Morcos
Dhruv Batra
67
33
0
30 Jan 2023
Deep networks for system identification: a Survey
G. Pillonetto
Aleksandr Aravkin
Daniel Gedon
L. Ljung
Antônio H. Ribeiro
Thomas B. Schon
OOD
100
45
0
30 Jan 2023
The Hidden Power of Pure 16-bit Floating-Point Neural Networks
Juyoung Yun
Byungkon Kang
Zhoulai Fu
MQ
28
1
0
30 Jan 2023
Do We Really Need Graph Neural Networks for Traffic Forecasting?
Xu Liu
Yuxuan Liang
Chao Huang
Hengchang Hu
Yushi Cao
Bryan Hooi
Roger Zimmermann
AI4TS
93
22
0
30 Jan 2023
Pipe-BD: Pipelined Parallel Blockwise Distillation
Hongsun Jang
Jaewon Jung
Jaeyong Song
Joonsang Yu
Youngsok Kim
Jinho Lee
MoE
AI4CE
66
2
0
29 Jan 2023
Exploring the Effect of Multi-step Ascent in Sharpness-Aware Minimization
Hoki Kim
Jinseong Park
Yujin Choi
Woojin Lee
Jaewook Lee
44
9
0
27 Jan 2023
ZiCo: Zero-shot NAS via Inverse Coefficient of Variation on Gradients
Guihong Li
Yuedong Yang
Kartikeya Bhardwaj
R. Marculescu
117
63
0
26 Jan 2023
On Batching Variable Size Inputs for Training End-to-End Speech Enhancement Systems
Philippe Gonzalez
T. S. Alstrøm
Tobias May
71
9
0
25 Jan 2023
ScaDLES: Scalable Deep Learning over Streaming data at the Edge
S. Tyagi
Martin Swany
42
6
0
21 Jan 2023
An SDE for Modeling SAM: Theory and Insights
Enea Monzio Compagnoni
Luca Biggio
Antonio Orvieto
F. Proske
Hans Kersting
Aurelien Lucchi
110
15
0
19 Jan 2023
Catapult Dynamics and Phase Transitions in Quadratic Nets
David Meltzer
Junyu Liu
67
9
0
18 Jan 2023
Stability Analysis of Sharpness-Aware Minimization
Hoki Kim
Jinseong Park
Yujin Choi
Jaewook Lee
78
13
0
16 Jan 2023
Disjoint Masking with Joint Distillation for Efficient Masked Image Modeling
Xin Ma
Chang-Shu Liu
Chunyu Xie
Long Ye
Yafeng Deng
Xiang Ji
137
9
0
31 Dec 2022
Escaping Saddle Points for Effective Generalization on Class-Imbalanced Data
Harsh Rangwani
Sumukh K Aithal
Mayank Mishra
R. Venkatesh Babu
75
31
0
28 Dec 2022
Deep learning for size-agnostic inverse design of random-network 3D printed mechanical metamaterials
H. Pahlavani
Kostas Tsifoutis-Kazolis
P. Mody
Jie Zhou
M. J. Mirzaali
A. A. Zadpoor
AI4CE
64
41
0
22 Dec 2022
Domain Generalization with Correlated Style Uncertainty
Zheyu Zhang
Bin Wang
Debesh Jha
Ugur Demir
Ulas Bagci
OOD
106
6
0
20 Dec 2022
Colab NAS: Obtaining lightweight task-specific convolutional neural networks following Occam's razor
Andrea Mattia Garavagno
D. Leonardis
A. Frisoli
96
1
0
15 Dec 2022
A Statistical Model for Predicting Generalization in Few-Shot Classification
Yassir Bendou
Vincent Gripon
Bastien Pasdeloup
Lukas Mauch
Stefan Uhlich
Fabien Cardinaux
G. B. Hacene
Javier Alonso García
87
2
0
13 Dec 2022
Improving Generalization of Pre-trained Language Models via Stochastic Weight Averaging
Peng Lu
I. Kobyzev
Mehdi Rezagholizadeh
Ahmad Rashid
A. Ghodsi
Philippe Langlais
MoMe
100
11
0
12 Dec 2022
Accelerating Self-Supervised Learning via Efficient Training Strategies
Mustafa Taha Koccyiugit
Timothy M. Hospedales
Hakan Bilen
SSL
66
8
0
11 Dec 2022
Error-aware Quantization through Noise Tempering
Zheng Wang
Juncheng Billy Li
Shuhui Qu
Florian Metze
Emma Strubell
MQ
38
2
0
11 Dec 2022
Cyclic Block Coordinate Descent With Variance Reduction for Composite Nonconvex Optimization
Xu Cai
Chaobing Song
Stephen J. Wright
Jelena Diakonikolas
80
14
0
09 Dec 2022
Adversarial Weight Perturbation Improves Generalization in Graph Neural Networks
Yihan Wu
Aleksandar Bojchevski
Heng Huang
AAML
99
30
0
09 Dec 2022
Improved Deep Neural Network Generalization Using m-Sharpness-Aware Minimization
Kayhan Behdin
Qingquan Song
Aman Gupta
D. Durfee
Ayan Acharya
S. Keerthi
Rahul Mazumder
AAML
53
5
0
07 Dec 2022
Convergence of ease-controlled Random Reshuffling gradient Algorithms under Lipschitz smoothness
R. Seccia
Corrado Coppola
G. Liuzzi
L. Palagi
61
2
0
04 Dec 2022
PiPar: Pipeline Parallelism for Collaborative Machine Learning
Zihan Zhang
Philip Rodgers
Peter Kilpatrick
I. Spence
Blesson Varghese
FedML
80
3
0
01 Dec 2022
Task Discovery: Finding the Tasks that Neural Networks Generalize on
Andrei Atanov
Andrei Filatov
Teresa Yeo
Ajay Sohmshetty
Amir Zamir
OOD
132
10
0
01 Dec 2022
Adaptive adversarial training method for improving multi-scale GAN based on generalization bound theory
Jin-Lin Tang
B. Tao
Zeyu Gong
Zhoupin Yin
AI4CE
61
1
0
30 Nov 2022
Boosted Dynamic Neural Networks
Haichao Yu
Haoxiang Li
G. Hua
Gao Huang
Humphrey Shi
96
8
0
30 Nov 2022
Disentangling the Mechanisms Behind Implicit Regularization in SGD
Cheng-i Wang
Simran Kaur
Tanya Marwah
Saurabh Garg
Zachary Chase Lipton
FedML
100
2
0
29 Nov 2022
A survey of deep learning optimizers -- first and second order methods
Rohan Kashyap
ODL
97
7
0
28 Nov 2022
Exploring Temporal Information Dynamics in Spiking Neural Networks
Youngeun Kim
Yuhang Li
Hyoungseob Park
Yeshwanth Venkatesha
Anna Hambitzer
Priyadarshini Panda
86
35
0
26 Nov 2022
The Vanishing Decision Boundary Complexity and the Strong First Component
Hengshuai Yao
UQCV
59
0
0
25 Nov 2022
PipeFisher: Efficient Training of Large Language Models Using Pipelining and Fisher Information Matrices
Kazuki Osawa
Shigang Li
Torsten Hoefler
AI4CE
84
26
0
25 Nov 2022
Cross-Domain Ensemble Distillation for Domain Generalization
Kyung-Jin Lee
Sungyeon Kim
Suha Kwak
FedML
OOD
79
38
0
25 Nov 2022
PAC-Bayes Compression Bounds So Tight That They Can Explain Generalization
Sanae Lotfi
Marc Finzi
Sanyam Kapoor
Andres Potapczynski
Micah Goldblum
A. Wilson
BDL
MLT
AI4CE
87
62
0
24 Nov 2022
Improving Multi-task Learning via Seeking Task-based Flat Regions
Hoang Phan
Lam C. Tran
Ngoc N. Tran
Nhat Ho
Tuan Truong
Qi Lei
Nhat Ho
Dinh Q. Phung
Trung Le
209
11
0
24 Nov 2022
ModelDiff: A Framework for Comparing Learning Algorithms
Harshay Shah
Sung Min Park
Andrew Ilyas
Aleksander Madry
SyDa
104
29
0
22 Nov 2022
Efficient Generalization Improvement Guided by Random Weight Perturbation
Tao Li
Wei Yan
Zehao Lei
Yingwen Wu
Kun Fang
Ming-Hsuan Yang
Xiaolin Huang
AAML
68
6
0
21 Nov 2022
Minimizing the Accumulated Trajectory Error to Improve Dataset Distillation
Jiawei Du
Yiding Jiang
Vincent Y. F. Tan
Qiufeng Wang
Haizhou Li
DD
99
119
0
20 Nov 2022
SAMSON: Sharpness-Aware Minimization Scaled by Outlier Normalization for Improving DNN Generalization and Robustness
Gonçalo Mordido
Sébastien Henwood
Sarath Chandar
Franccois Leduc-Primeau
AAML
42
0
0
18 Nov 2022
REPAIR: REnormalizing Permuted Activations for Interpolation Repair
Keller Jordan
Hanie Sedghi
O. Saukh
R. Entezari
Behnam Neyshabur
MoMe
122
101
0
15 Nov 2022
Towards A Unified Conformer Structure: from ASR to ASV Task
Dexin Liao
Tao Jiang
Feng Wang
Lin Li
Q. Hong
86
10
0
14 Nov 2022
How Does Sharpness-Aware Minimization Minimize Sharpness?
Kaiyue Wen
Tengyu Ma
Zhiyuan Li
AAML
85
50
0
10 Nov 2022
Instance-Dependent Generalization Bounds via Optimal Transport
Songyan Hou
Parnian Kassraie
Anastasis Kratsios
Andreas Krause
Jonas Rothfuss
100
6
0
02 Nov 2022
Class Interference of Deep Neural Networks
Dongcui Diao
Hengshuai Yao
Bei Jiang
46
1
0
31 Oct 2022
Symmetries, flat minima, and the conserved quantities of gradient flow
Bo Zhao
I. Ganev
Robin Walters
Rose Yu
Nima Dehmamy
105
20
0
31 Oct 2022
Flatter, faster: scaling momentum for optimal speedup of SGD
Aditya Cowsik
T. Can
Paolo Glorioso
98
5
0
28 Oct 2022
Watermarking for Out-of-distribution Detection
Qizhou Wang
Feng Liu
Yonggang Zhang
Jing Zhang
Chen Gong
Tongliang Liu
Bo Han
OODD
88
32
0
27 Oct 2022
Previous
1
2
3
...
10
11
12
...
30
31
32
Next