Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1609.04836
Cited By
v1
v2 (latest)
On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima
15 September 2016
N. Keskar
Dheevatsa Mudigere
J. Nocedal
M. Smelyanskiy
P. T. P. Tang
ODL
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima"
50 / 1,554 papers shown
Title
Leader Stochastic Gradient Descent for Distributed Training of Deep Learning Models: Extension
Yunfei Teng
Wenbo Gao
F. Chalus
A. Choromańska
Shiqian Ma
Adrian Weller
134
12
0
24 May 2019
Loss Surface Modality of Feed-Forward Neural Network Architectures
Anna Sergeevna Bosman
A. Engelbrecht
Mardé Helbig
43
9
0
24 May 2019
Explicitizing an Implicit Bias of the Frequency Principle in Two-layer Neural Networks
Yaoyu Zhang
Zhi-Qin John Xu
Yaoyu Zhang
Zheng Ma
MLT
AI4CE
130
38
0
24 May 2019
The role of invariance in spectral complexity-based generalization bounds
Konstantinos Pitas
Andreas Loukas
Mike Davies
P. Vandergheynst
BDL
16
1
0
23 May 2019
Improving Neural Networks by Adopting Amplifying and Attenuating Neurons
Seongmun Jung
O. Kwon
16
0
0
23 May 2019
Shaping the learning landscape in neural networks around wide flat minima
Carlo Baldassi
Fabrizio Pittorino
R. Zecchina
MLT
75
84
0
20 May 2019
Lexicographic and Depth-Sensitive Margins in Homogeneous and Non-Homogeneous Deep Models
Mor Shpigel Nacson
Suriya Gunasekar
Jason D. Lee
Nathan Srebro
Daniel Soudry
92
94
0
17 May 2019
Be Your Own Teacher: Improve the Performance of Convolutional Neural Networks via Self Distillation
Linfeng Zhang
Jiebo Song
Anni Gao
Jingwei Chen
Chenglong Bao
Kaisheng Ma
FedML
85
865
0
17 May 2019
Orthogonal Deep Neural Networks
Kui Jia
Shuai Li
Yuxin Wen
Tongliang Liu
Dacheng Tao
93
134
0
15 May 2019
Scaling Distributed Training of Flood-Filling Networks on HPC Infrastructure for Brain Mapping
Wu Dong
Murat Keçeli
Rafael Vescovi
Hanyu Li
Corey Adams
...
T. Uram
V. Vishwanath
N. Ferrier
B. Kasthuri
P. Littlewood
FedML
AI4CE
40
9
0
13 May 2019
Interpreting and Evaluating Neural Network Robustness
Fuxun Yu
Zhuwei Qin
Chenchen Liu
Liang Zhao
Yanzhi Wang
Xiang Chen
AAML
54
56
0
10 May 2019
The sharp, the flat and the shallow: Can weakly interacting agents learn to escape bad minima?
N. Kantas
P. Parpas
G. Pavliotis
ODL
30
8
0
10 May 2019
The Effect of Network Width on Stochastic Gradient Descent and Generalization: an Empirical Study
Daniel S. Park
Jascha Narain Sohl-Dickstein
Quoc V. Le
Samuel L. Smith
96
57
0
09 May 2019
Data-dependent Sample Complexity of Deep Neural Networks via Lipschitz Augmentation
Colin Wei
Tengyu Ma
85
110
0
09 May 2019
Full-Gradient Representation for Neural Network Visualization
Suraj Srinivas
François Fleuret
MILM
FAtt
95
276
0
02 May 2019
SWALP : Stochastic Weight Averaging in Low-Precision Training
Guandao Yang
Tianyi Zhang
Polina Kirichenko
Junwen Bai
A. Wilson
Christopher De Sa
85
97
0
26 Apr 2019
Improved visible to IR image transformation using synthetic data augmentation with cycle-consistent adversarial networks
Kyongsik Yun
Kevin Yu
Joseph Osborne
S. Eldin
Luan Nguyen
Alexander Huyen
Thomas Lu
GAN
29
19
0
25 Apr 2019
Communication trade-offs for synchronized distributed SGD with large step size
Kumar Kshitij Patel
Aymeric Dieuleveut
FedML
61
27
0
25 Apr 2019
HARK Side of Deep Learning -- From Grad Student Descent to Automated Machine Learning
O. Gencoglu
M. Gils
E. Guldogan
Chamin Morikawa
Mehmet Süzen
M. Gruber
J. Leinonen
H. Huttunen
98
36
0
16 Apr 2019
MxML: Mixture of Meta-Learners for Few-Shot Classification
Minseop Park
Jungtaek Kim
Saehoon Kim
Yanbin Liu
Seungjin Choi
OODD
33
8
0
11 Apr 2019
A Comparative Analysis of the Optimization and Generalization Property of Two-layer Neural Network and Random Feature Models Under Gradient Descent Dynamics
E. Weinan
Chao Ma
Lei Wu
MLT
77
124
0
08 Apr 2019
Information Bottleneck and its Applications in Deep Learning
Hassan Hafez-Kolahi
S. Kasaei
53
19
0
07 Apr 2019
Parallelizable Stack Long Short-Term Memory
Shuoyang Ding
Philipp Koehn
51
3
0
06 Apr 2019
DeLTA: GPU Performance Model for Deep Learning Applications with In-depth Memory System Traffic Analysis
Sangkug Lym
Donghyuk Lee
Mike O'Connor
Niladrish Chatterjee
M. Erez
78
37
0
02 Apr 2019
Lautum Regularization for Semi-supervised Transfer Learning
Daniel Jakubovitz
M. Rodrigues
Raja Giryes
67
4
0
02 Apr 2019
Why ResNet Works? Residuals Generalize
Fengxiang He
Tongliang Liu
Dacheng Tao
65
253
0
02 Apr 2019
Optimal Obfuscation Mechanisms via Machine Learning
Marco Romanelli
K. Chatzikokolakis
C. Palamidessi
AAML
53
12
0
01 Apr 2019
Large Batch Optimization for Deep Learning: Training BERT in 76 minutes
Yang You
Jing Li
Sashank J. Reddi
Jonathan Hseu
Sanjiv Kumar
Srinadh Bhojanapalli
Xiaodan Song
J. Demmel
Kurt Keutzer
Cho-Jui Hsieh
ODL
292
1,000
0
01 Apr 2019
Gradient Descent with Early Stopping is Provably Robust to Label Noise for Overparameterized Neural Networks
Mingchen Li
Mahdi Soltanolkotabi
Samet Oymak
NoLa
129
355
0
27 Mar 2019
Improving Strong-Scaling of CNN Training by Exploiting Finer-Grained Parallelism
Nikoli Dryden
N. Maruyama
Tom Benson
Tim Moon
M. Snir
B. Van Essen
69
49
0
15 Mar 2019
Inefficiency of K-FAC for Large Batch Size Training
Linjian Ma
Gabe Montague
Jiayu Ye
Z. Yao
A. Gholami
Kurt Keutzer
Michael W. Mahoney
49
24
0
14 Mar 2019
Communication-efficient distributed SGD with Sketching
Nikita Ivkin
D. Rothchild
Enayat Ullah
Vladimir Braverman
Ion Stoica
R. Arora
FedML
69
200
0
12 Mar 2019
SLIDE : In Defense of Smart Algorithms over Hardware Acceleration for Large-Scale Deep Learning Systems
Beidi Chen
Tharun Medini
James Farwell
Sameh Gobriel
Charlie Tai
Anshumali Shrivastava
85
105
0
07 Mar 2019
Positively Scale-Invariant Flatness of ReLU Neural Networks
Mingyang Yi
Qi Meng
Wei-neng Chen
Zhi-Ming Ma
Tie-Yan Liu
76
18
0
06 Mar 2019
Implicit Regularization in Over-parameterized Neural Networks
M. Kubo
Ryotaro Banno
Hidetaka Manabe
Masataka Minoji
76
23
0
05 Mar 2019
Deep Learning Based Motion Planning For Autonomous Vehicle Using Spatiotemporal LSTM Network
Zhengwei Bai
B. Cai
Shangguan Wei
Linguo Chai
31
27
0
05 Mar 2019
Multilingual Neural Machine Translation with Knowledge Distillation
Xu Tan
Yi Ren
Di He
Tao Qin
Zhou Zhao
Tie-Yan Liu
112
250
0
27 Feb 2019
An Empirical Study of Large-Batch Stochastic Gradient Descent with Structured Covariance Noise
Yeming Wen
Kevin Luk
Maxime Gazeau
Guodong Zhang
Harris Chan
Jimmy Ba
ODL
71
22
0
21 Feb 2019
A Little Is Enough: Circumventing Defenses For Distributed Learning
Moran Baruch
Gilad Baruch
Yoav Goldberg
FedML
65
514
0
16 Feb 2019
Parameter Efficient Training of Deep Convolutional Neural Networks by Dynamic Sparse Reparameterization
Hesham Mostafa
Xin Wang
114
315
0
15 Feb 2019
Training on the Edge: The why and the how
Navjot Kukreja
Alena Shilova
Olivier Beaumont
Jan Huckelheim
N. Ferrier
P. Hovland
Gerard Gorman
49
36
0
13 Feb 2019
Uniform convergence may be unable to explain generalization in deep learning
Vaishnavh Nagarajan
J. Zico Kolter
MoMe
AI4CE
98
317
0
13 Feb 2019
Towards moderate overparameterization: global convergence guarantees for training shallow neural networks
Samet Oymak
Mahdi Soltanolkotabi
63
323
0
12 Feb 2019
Cyclical Stochastic Gradient MCMC for Bayesian Deep Learning
Ruqi Zhang
Chunyuan Li
Jianyi Zhang
Changyou Chen
A. Wilson
BDL
88
278
0
11 Feb 2019
A Simple Baseline for Bayesian Uncertainty in Deep Learning
Wesley J. Maddox
T. Garipov
Pavel Izmailov
Dmitry Vetrov
A. Wilson
BDL
UQCV
117
810
0
07 Feb 2019
A Scale Invariant Flatness Measure for Deep Network Minima
Akshay Rangamani
Nam H. Nguyen
Abhishek Kumar
Dzung Phan
Sang H. Chin
T. Tran
ODL
88
31
0
06 Feb 2019
Are All Layers Created Equal?
Chiyuan Zhang
Samy Bengio
Y. Singer
111
140
0
06 Feb 2019
Distribution-Dependent Analysis of Gibbs-ERM Principle
Ilja Kuzborskij
Nicolò Cesa-Bianchi
Csaba Szepesvári
74
20
0
05 Feb 2019
Asymmetric Valleys: Beyond Sharp and Flat Local Minima
Haowei He
Gao Huang
Yang Yuan
ODL
MLT
79
150
0
02 Feb 2019
Episodic Training for Domain Generalization
Da Li
Jianshu Zhang
Yongxin Yang
Cong Liu
Yi-Zhe Song
Timothy M. Hospedales
OOD
144
450
0
31 Jan 2019
Previous
1
2
3
...
25
26
27
...
30
31
32
Next