Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1609.04836
Cited By
v1
v2 (latest)
On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima
15 September 2016
N. Keskar
Dheevatsa Mudigere
J. Nocedal
M. Smelyanskiy
P. T. P. Tang
ODL
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima"
50 / 1,554 papers shown
Title
Online Knowledge Distillation with Diverse Peers
Defang Chen
Jian-Ping Mei
Can Wang
Yan Feng
Chun-Yen Chen
FedML
87
302
0
01 Dec 2019
A Reparameterization-Invariant Flatness Measure for Deep Neural Networks
Henning Petzka
Linara Adilova
Michael Kamp
C. Sminchisescu
ODL
55
8
0
29 Nov 2019
On the Heavy-Tailed Theory of Stochastic Gradient Descent for Deep Neural Networks
Umut Simsekli
Mert Gurbuzbalaban
T. H. Nguyen
G. Richard
Levent Sagun
88
59
0
29 Nov 2019
Auto-Precision Scaling for Distributed Deep Learning
Ruobing Han
J. Demmel
Yang You
43
5
0
20 Nov 2019
Information-Theoretic Local Minima Characterization and Regularization
Zhiwei Jia
Hao Su
73
19
0
19 Nov 2019
Signed Input Regularization
Saeid Asgari Taghanaki
Kumar Abhishek
Ghassan Hamarneh
AAML
43
1
0
16 Nov 2019
Information-Theoretic Perspective of Federated Learning
Linara Adilova
Julia Rosenzweig
Michael Kamp
FedML
13
4
0
15 Nov 2019
Optimal Mini-Batch Size Selection for Fast Gradient Descent
M. Perrone
Haidar Khan
Changhoan Kim
Anastasios Kyrillidis
Jerry Quinn
V. Salapura
38
9
0
15 Nov 2019
MindTheStep-AsyncPSGD: Adaptive Asynchronous Parallel Stochastic Gradient Descent
Karl Bäckström
Marina Papatriantafilou
P. Tsigas
58
12
0
08 Nov 2019
Small-GAN: Speeding Up GAN Training Using Core-sets
Samarth Sinha
Hang Zhang
Anirudh Goyal
Yoshua Bengio
Hugo Larochelle
Augustus Odena
GAN
99
77
0
29 Oct 2019
E2-Train: Training State-of-the-art CNNs with Over 80% Energy Savings
Yue Wang
Ziyu Jiang
Xiaohan Chen
Pengfei Xu
Yang Zhao
Yingyan Lin
Zhangyang Wang
MQ
107
83
0
29 Oct 2019
Neural Density Estimation and Likelihood-free Inference
George Papamakarios
BDL
DRL
95
47
0
29 Oct 2019
A Simple Dynamic Learning Rate Tuning Algorithm For Automated Training of DNNs
Koyel Mukherjee
Alind Khare
Ashish Verma
74
15
0
25 Oct 2019
Diametrical Risk Minimization: Theory and Computations
Matthew Norton
J. Royset
57
19
0
24 Oct 2019
Explicitly Bayesian Regularizations in Deep Learning
Xinjie Lan
Kenneth Barner
UQCV
BDL
AI4CE
103
1
0
22 Oct 2019
Robust Learning Rate Selection for Stochastic Optimization via Splitting Diagnostic
Matteo Sordello
Niccolò Dalmasso
Hangfeng He
Weijie Su
50
7
0
18 Oct 2019
On Warm-Starting Neural Network Training
Jordan T. Ash
Ryan P. Adams
AI4CE
58
21
0
18 Oct 2019
Improving the convergence of SGD through adaptive batch sizes
Scott Sievert
Zachary B. Charles
ODL
63
8
0
18 Oct 2019
KonIQ-10k: An ecologically valid database for deep learning of blind image quality assessment
Vlad Hosu
Hanhe Lin
T. Szirányi
Dietmar Saupe
123
582
0
14 Oct 2019
Emergent properties of the local geometry of neural loss landscapes
Stanislav Fort
Surya Ganguli
120
51
0
14 Oct 2019
Improved Sample Complexities for Deep Networks and Robust Classification via an All-Layer Margin
Colin Wei
Tengyu Ma
AAML
OOD
72
85
0
09 Oct 2019
Parallelizing Training of Deep Generative Models on Massive Scientific Datasets
S. A. Jacobs
B. Van Essen
D. Hysom
Jae-Seung Yeom
Tim Moon
...
J. Gaffney
Tom Benson
Peter B. Robinson
L. Peterson
B. Spears
BDL
AI4CE
67
17
0
05 Oct 2019
Distributed Learning of Deep Neural Networks using Independent Subnet Training
John Shelton Hyatt
Cameron R. Wolfe
Michael Lee
Yuxin Tang
Anastasios Kyrillidis
Christopher M. Jermaine
OOD
92
39
0
04 Oct 2019
Generalization Bounds for Convolutional Neural Networks
Shan Lin
Jingwei Zhang
MLT
60
35
0
03 Oct 2019
Truth or Backpropaganda? An Empirical Investigation of Deep Learning Theory
Micah Goldblum
Jonas Geiping
Avi Schwarzschild
Michael Moeller
Tom Goldstein
103
34
0
01 Oct 2019
How noise affects the Hessian spectrum in overparameterized neural networks
Ming-Bo Wei
D. Schwab
85
28
0
01 Oct 2019
At Stability's Edge: How to Adjust Hyperparameters to Preserve Minima Selection in Asynchronous Training of Neural Networks?
Niv Giladi
Mor Shpigel Nacson
Elad Hoffer
Daniel Soudry
80
22
0
26 Sep 2019
GradVis: Visualization and Second Order Analysis of Optimization Surfaces during the Training of Deep Neural Networks
Avraam Chatzimichailidis
Franz-Josef Pfreundt
N. Gauger
J. Keuper
52
10
0
26 Sep 2019
Towards Understanding the Transferability of Deep Representations
Hong Liu
Mingsheng Long
Jianmin Wang
Michael I. Jordan
66
26
0
26 Sep 2019
A Closer Look at Domain Shift for Deep Learning in Histopathology
Karin Stacke
Gabriel Eilertsen
Jonas Unger
Claes Lundström
OOD
63
62
0
25 Sep 2019
EEG-Based Driver Drowsiness Estimation Using Feature Weighted Episodic Training
Yuqi Cui
Yifan Xu
Dongrui Wu
64
63
0
25 Sep 2019
Decentralized Markov Chain Gradient Descent
Tao Sun
Dongsheng Li
BDL
91
11
0
23 Sep 2019
Scale MLPerf-0.6 models on Google TPU-v3 Pods
Sameer Kumar
Victor Bitorff
Dehao Chen
Chi-Heng Chou
Blake A. Hechtman
...
Peter Mattson
Shibo Wang
Tao Wang
Yuanzhong Xu
Zongwei Zhou
67
39
0
21 Sep 2019
Understanding and Robustifying Differentiable Architecture Search
Arber Zela
T. Elsken
Tonmoy Saikia
Yassine Marrakchi
Thomas Brox
Frank Hutter
OOD
AAML
154
375
0
20 Sep 2019
Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism
Mohammad Shoeybi
M. Patwary
Raul Puri
P. LeGresley
Jared Casper
Bryan Catanzaro
MoE
358
1,922
0
17 Sep 2019
Visualizing Movement Control Optimization Landscapes
Perttu Hämäläinen
Juuso Toikka
Amin Babadi
Karen Liu
56
7
0
17 Sep 2019
Addressing Algorithmic Bottlenecks in Elastic Machine Learning with Chicle
Michael Kaufmann
K. Kourtis
Celestine Mendler-Dünner
Adrian Schüpbach
Thomas Parnell
13
0
0
11 Sep 2019
Towards Understanding the Importance of Noise in Training Neural Networks
Mo Zhou
Tianyi Liu
Yan Li
Dachao Lin
Enlu Zhou
T. Zhao
MLT
92
26
0
07 Sep 2019
Dissecting Non-Vacuous Generalization Bounds based on the Mean-Field Approximation
Konstantinos Pitas
58
8
0
06 Sep 2019
LCA: Loss Change Allocation for Neural Network Training
Janice Lan
Rosanne Liu
Hattie Zhou
J. Yosinski
73
25
0
03 Sep 2019
Hybrid Data-Model Parallel Training for Sequence-to-Sequence Recurrent Neural Network Machine Translation
Junya Ono
Masao Utiyama
Eiichiro Sumita
AIMat
AI4CE
38
7
0
02 Sep 2019
Deep Learning Theory Review: An Optimal Control and Dynamical Systems Perspective
Guan-Horng Liu
Evangelos A. Theodorou
AI4CE
118
72
0
28 Aug 2019
Towards Better Generalization: BP-SVRG in Training Deep Neural Networks
Hao Jin
Dachao Lin
Zhihua Zhang
ODL
35
2
0
18 Aug 2019
Regularizing CNN Transfer Learning with Randomised Regression
Yang Zhong
A. Maki
117
13
0
16 Aug 2019
Visualizing and Understanding the Effectiveness of BERT
Y. Hao
Li Dong
Furu Wei
Ke Xu
150
186
0
15 Aug 2019
Mix & Match: training convnets with mixed image sizes for improved accuracy, speed and scale resiliency
Elad Hoffer
Berry Weinstein
Itay Hubara
Tal Ben-Nun
Torsten Hoefler
Daniel Soudry
113
20
0
12 Aug 2019
Instance Enhancement Batch Normalization: an Adaptive Regulator of Batch Noise
Senwei Liang
Zhongzhan Huang
Mingfu Liang
Haizhao Yang
94
59
0
12 Aug 2019
Progressive Transfer Learning
Zhengxu Yu
Long Wei
Zhongming Jin
Jianqiang Huang
Deng Cai
Xiansheng Hua
VLM
58
10
0
07 Aug 2019
How Does Learning Rate Decay Help Modern Neural Networks?
Kaichao You
Mingsheng Long
Jianmin Wang
Michael I. Jordan
66
4
0
05 Aug 2019
On the Existence of Simpler Machine Learning Models
Lesia Semenova
Cynthia Rudin
Ronald E. Parr
117
87
0
05 Aug 2019
Previous
1
2
3
...
23
24
25
...
30
31
32
Next