Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1609.04836
Cited By
v1
v2 (latest)
On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima
15 September 2016
N. Keskar
Dheevatsa Mudigere
J. Nocedal
M. Smelyanskiy
P. T. P. Tang
ODL
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima"
50 / 1,554 papers shown
Title
Hyperplane Arrangements of Trained ConvNets Are Biased
Matteo Gamba
S. Carlsson
Hossein Azizpour
Mårten Björkman
41
5
0
17 Mar 2020
Investigating Generalization in Neural Networks under Optimally Evolved Training Perturbations
Subhajit Chaudhury
T. Yamasaki
24
3
0
14 Mar 2020
Interference and Generalization in Temporal Difference Learning
Emmanuel Bengio
Joelle Pineau
Doina Precup
77
61
0
13 Mar 2020
Communication-Efficient Distributed Deep Learning: A Comprehensive Survey
Zhenheng Tang
Shaoshuai Shi
Wei Wang
Yue Liu
Xiaowen Chu
80
49
0
10 Mar 2020
Wide-minima Density Hypothesis and the Explore-Exploit Learning Rate Schedule
Nikhil Iyer
V. Thejas
Nipun Kwatra
Ramachandran Ramjee
Muthian Sivathanu
75
29
0
09 Mar 2020
AL2: Progressive Activation Loss for Learning General Representations in Classification Neural Networks
Majed El Helou
Frederike Dumbgen
Sabine Süsstrunk
CLL
AI4CE
32
2
0
07 Mar 2020
Communication optimization strategies for distributed deep neural network training: A survey
Shuo Ouyang
Dezun Dong
Yemao Xu
Liquan Xiao
116
12
0
06 Mar 2020
The large learning rate phase of deep learning: the catapult mechanism
Aitor Lewkowycz
Yasaman Bahri
Ethan Dyer
Jascha Narain Sohl-Dickstein
Guy Gur-Ari
ODL
218
241
0
04 Mar 2020
Rethinking Parameter Counting in Deep Models: Effective Dimensionality Revisited
Wesley J. Maddox
Gregory W. Benton
A. Wilson
136
61
0
04 Mar 2020
Automatic Perturbation Analysis for Scalable Certified Robustness and Beyond
Kaidi Xu
Zhouxing Shi
Huan Zhang
Yihan Wang
Kai-Wei Chang
Minlie Huang
B. Kailkhura
Xinyu Lin
Cho-Jui Hsieh
AAML
55
12
0
28 Feb 2020
The Implicit and Explicit Regularization Effects of Dropout
Colin Wei
Sham Kakade
Tengyu Ma
116
118
0
28 Feb 2020
Coherent Gradients: An Approach to Understanding Generalization in Gradient Descent-based Optimization
S. Chatterjee
ODL
OOD
117
51
0
25 Feb 2020
The Two Regimes of Deep Network Training
Guillaume Leclerc
Aleksander Madry
94
45
0
24 Feb 2020
De-randomized PAC-Bayes Margin Bounds: Applications to Non-convex and Non-smooth Predictors
A. Banerjee
Tiancong Chen
Yingxue Zhou
BDL
70
8
0
23 Feb 2020
Communication-Efficient Edge AI: Algorithms and Systems
Yuanming Shi
Kai Yang
Tao Jiang
Jun Zhang
Khaled B. Letaief
GNN
99
334
0
22 Feb 2020
The Break-Even Point on Optimization Trajectories of Deep Neural Networks
Stanislaw Jastrzebski
Maciej Szymczak
Stanislav Fort
Devansh Arpit
Jacek Tabor
Kyunghyun Cho
Krzysztof J. Geras
88
164
0
21 Feb 2020
Parallel and distributed asynchronous adaptive stochastic gradient methods
Yangyang Xu
Yibo Xu
Yonggui Yan
Colin Sutcher-Shepard
Leopold Grinberg
Jiewei Chen
30
2
0
21 Feb 2020
Bayesian Deep Learning and a Probabilistic Perspective of Generalization
A. Wilson
Pavel Izmailov
UQCV
BDL
OOD
148
656
0
20 Feb 2020
Do We Need Zero Training Loss After Achieving Zero Training Error?
Takashi Ishida
Ikko Yamane
Tomoya Sakai
Gang Niu
Masashi Sugiyama
AI4CE
70
137
0
20 Feb 2020
Revisiting Training Strategies and Generalization Performance in Deep Metric Learning
Karsten Roth
Timo Milbich
Samarth Sinha
Prateek Gupta
Bjorn Ommer
Joseph Paul Cohen
163
173
0
19 Feb 2020
Unique Properties of Flat Minima in Deep Networks
Rotem Mulayoff
T. Michaeli
ODL
59
4
0
11 Feb 2020
Think Global, Act Local: Relating DNN generalisation and node-level SNR
Paul Norridge
24
1
0
11 Feb 2020
A Diffusion Theory For Deep Learning Dynamics: Stochastic Gradient Descent Exponentially Favors Flat Minima
Zeke Xie
Issei Sato
Masashi Sugiyama
ODL
127
17
0
10 Feb 2020
Large Batch Training Does Not Need Warmup
Zhouyuan Huo
Bin Gu
Heng-Chiao Huang
AI4CE
ODL
47
5
0
04 Feb 2020
Optimizing Loss Functions Through Multivariate Taylor Polynomial Parameterization
Santiago Gonzalez
Risto Miikkulainen
55
9
0
31 Jan 2020
The Case for Bayesian Deep Learning
A. Wilson
UQCV
BDL
OOD
132
114
0
29 Jan 2020
Identifying Mislabeled Data using the Area Under the Margin Ranking
Geoff Pleiss
Tianyi Zhang
Ethan R. Elenberg
Kilian Q. Weinberger
NoLa
119
274
0
28 Jan 2020
Automatic phantom test pattern classification through transfer learning with deep neural networks
Rafael B. Fricks
Justin Solomon
Ehsan Samei
MedIm
30
0
0
22 Jan 2020
A Deep Learning Algorithm for High-Dimensional Exploratory Item Factor Analysis
Christopher J. Urban
Daniel J. Bauer
BDL
64
33
0
22 Jan 2020
Understanding Why Neural Networks Generalize Well Through GSNR of Parameters
Jinlong Liu
Guo-qing Jiang
Yunzhi Bai
Ting Chen
Huayan Wang
AI4CE
143
50
0
21 Jan 2020
SEERL: Sample Efficient Ensemble Reinforcement Learning
Rohan Saphal
Balaraman Ravindran
Dheevatsa Mudigere
Sasikanth Avancha
Bharat Kaul
65
19
0
15 Jan 2020
Uncertainty-Aware Multi-Shot Knowledge Distillation for Image-Based Object Re-Identification
Xin Jin
Cuiling Lan
Wenjun Zeng
Zhibo Chen
79
106
0
15 Jan 2020
Understanding Generalization in Deep Learning via Tensor Methods
Jingling Li
Yanchao Sun
Jiahao Su
Taiji Suzuki
Furong Huang
112
28
0
14 Jan 2020
Rethinking Curriculum Learning with Incremental Labels and Adaptive Compensation
Madan Ravi Ganesh
Jason J. Corso
ODL
54
10
0
13 Jan 2020
Stochastic Weight Averaging in Parallel: Large-Batch Training that Generalizes Well
Vipul Gupta
S. Serrano
D. DeCoste
MoMe
83
60
0
07 Jan 2020
Relative Flatness and Generalization
Henning Petzka
Michael Kamp
Linara Adilova
C. Sminchisescu
Mario Boley
87
78
0
03 Jan 2020
'Place-cell' emergence and learning of invariant data with restricted Boltzmann machines: breaking and dynamical restoration of continuous symmetries in the weight space
Moshir Harsh
J. Tubiana
Simona Cocco
R. Monasson
49
15
0
30 Dec 2019
CProp: Adaptive Learning Rate Scaling from Past Gradient Conformity
Konpat Preechakul
B. Kijsirikul
ODL
38
3
0
24 Dec 2019
Landscape Connectivity and Dropout Stability of SGD Solutions for Over-parameterized Neural Networks
Aleksandr Shevchenko
Marco Mondelli
196
37
0
20 Dec 2019
Analysis of Video Feature Learning in Two-Stream CNNs on the Example of Zebrafish Swim Bout Classification
Bennet Breier
A. Onken
36
4
0
20 Dec 2019
Optimization for deep learning: theory and algorithms
Ruoyu Sun
ODL
137
169
0
19 Dec 2019
Tangent Space Separability in Feedforward Neural Networks
Balint Daroczy
Rita Aleksziev
András A. Benczúr
45
3
0
18 Dec 2019
Learning under Model Misspecification: Applications to Variational and Ensemble methods
A. Masegosa
16
1
0
18 Dec 2019
On the Bias-Variance Tradeoff: Textbooks Need an Update
Brady Neal
43
18
0
17 Dec 2019
Linear Mode Connectivity and the Lottery Ticket Hypothesis
Jonathan Frankle
Gintare Karolina Dziugaite
Daniel M. Roy
Michael Carbin
MoMe
181
630
0
11 Dec 2019
Arithmetic addition of two integers by deep image classification networks: experiments to quantify their autonomous reasoning ability
Shuaicheng Liu
Ze Zhang
Kai Song
B. Zeng
24
1
0
10 Dec 2019
InfoCNF: An Efficient Conditional Continuous Normalizing Flow with Adaptive Solvers
T. Nguyen
Animesh Garg
Richard G. Baraniuk
Anima Anandkumar
TPM
104
9
0
09 Dec 2019
Observational Overfitting in Reinforcement Learning
Xingyou Song
Yiding Jiang
Stephen Tu
Yilun Du
Behnam Neyshabur
OffRL
124
140
0
06 Dec 2019
Fantastic Generalization Measures and Where to Find Them
Yiding Jiang
Behnam Neyshabur
H. Mobahi
Dilip Krishnan
Samy Bengio
AI4CE
148
611
0
04 Dec 2019
The Group Loss for Deep Metric Learning
Ismail Elezi
Sebastiano Vascon
Alessandro Torcinovich
Marcello Pelillo
Laura Leal-Taixe
175
51
0
01 Dec 2019
Previous
1
2
3
...
22
23
24
...
30
31
32
Next