Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1609.04836
Cited By
v1
v2 (latest)
On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima
15 September 2016
N. Keskar
Dheevatsa Mudigere
J. Nocedal
M. Smelyanskiy
P. T. P. Tang
ODL
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima"
50 / 1,554 papers shown
Title
The Dynamics of Learning: A Random Matrix Approach
Zhenyu Liao
Romain Couillet
AI4CE
69
43
0
30 May 2018
How Does Batch Normalization Help Optimization?
Shibani Santurkar
Dimitris Tsipras
Andrew Ilyas
Aleksander Madry
ODL
136
1,548
0
29 May 2018
Distilling Knowledge for Search-based Structured Prediction
Yijia Liu
Wanxiang Che
Huaipeng Zhao
Bing Qin
Ting Liu
51
22
0
29 May 2018
Investigating Label Noise Sensitivity of Convolutional Neural Networks for Fine Grained Audio Signal Labelling
Rainer Kelz
Gerhard Widmer
NoLa
26
4
0
28 May 2018
A Double-Deep Spatio-Angular Learning Framework for Light Field based Face Recognition
Alireza Sepas-Moghaddam
M. A. Haque
P. Correia
Kamal Nasrollahi
T. Moeslund
F. Pereira
CVBM
51
36
0
25 May 2018
Local SGD Converges Fast and Communicates Little
Sebastian U. Stich
FedML
205
1,071
0
24 May 2018
Input and Weight Space Smoothing for Semi-supervised Learning
Safa Cicek
Stefano Soatto
47
6
0
23 May 2018
Deep learning generalizes because the parameter-function map is biased towards simple functions
Guillermo Valle Pérez
Chico Q. Camargo
A. Louis
MLT
AI4CE
122
232
0
22 May 2018
Gradient Energy Matching for Distributed Asynchronous Gradient Descent
Joeri Hermans
Gilles Louppe
40
5
0
22 May 2018
Stochastic modified equations for the asynchronous stochastic gradient descent
Jing An
Jian-wei Lu
Lexing Ying
77
79
0
21 May 2018
Never look back - A modified EnKF method and its application to the training of neural networks without back propagation
E. Haber
F. Lucka
Lars Ruthotto
62
32
0
21 May 2018
SmoothOut: Smoothing Out Sharp Minima to Improve Generalization in Deep Learning
W. Wen
Yandan Wang
Feng Yan
Cong Xu
Chunpeng Wu
Yiran Chen
H. Li
79
51
0
21 May 2018
Parameter Hub: a Rack-Scale Parameter Server for Distributed Deep Neural Network Training
Liang Luo
Jacob Nelson
Luis Ceze
Amar Phanishayee
Arvind Krishnamurthy
154
121
0
21 May 2018
DNN or k-NN: That is the Generalize vs. Memorize Question
Gilad Cohen
Guillermo Sapiro
Raja Giryes
60
38
0
17 May 2018
Unifying Data, Model and Hybrid Parallelism in Deep Learning via Tensor Tiling
Minjie Wang
Chien-chin Huang
Jinyang Li
FedML
56
25
0
10 May 2018
On Visual Hallmarks of Robustness to Adversarial Malware
Alex Huang
Abdullah Al-Dujaili
Erik Hemberg
Una-May O’Reilly
AAML
69
7
0
09 May 2018
SaaS: Speed as a Supervisor for Semi-supervised Learning
Safa Cicek
Alhussein Fawzi
Stefano Soatto
BDL
85
19
0
02 May 2018
SHADE: Information Based Regularization for Deep Learning
Michael Blot
Thomas Robert
Nicolas Thome
Matthieu Cord
62
12
0
29 Apr 2018
HG-means: A scalable hybrid genetic algorithm for minimum sum-of-squares clustering
Daniel Gribel
Thibaut Vidal
37
41
0
25 Apr 2018
Path Planning in Support of Smart Mobility Applications using Generative Adversarial Networks
M. Mohammadi
Ala I. Al-Fuqaha
Jun-Seok Oh
GAN
69
24
0
23 Apr 2018
Revisiting Small Batch Training for Deep Neural Networks
Dominic Masters
Carlo Luschi
ODL
80
669
0
20 Apr 2018
Non-Vacuous Generalization Bounds at the ImageNet Scale: A PAC-Bayesian Compression Approach
Wenda Zhou
Victor Veitch
Morgane Austern
Ryan P. Adams
Peter Orbanz
75
215
0
16 Apr 2018
DeepFM: An End-to-End Wide & Deep Learning Framework for CTR Prediction
Huifeng Guo
Ruiming Tang
Yunming Ye
Zhenguo Li
Xiuqiang He
Zhenhua Dong
158
64
0
12 Apr 2018
Large scale distributed neural network training through online distillation
Rohan Anil
Gabriel Pereyra
Alexandre Passos
Róbert Ormándi
George E. Dahl
Geoffrey E. Hinton
FedML
336
408
0
09 Apr 2018
The Loss Surface of XOR Artificial Neural Networks
D. Mehta
Xiaojun Zhao
Edgar A. Bernal
D. Wales
156
19
0
06 Apr 2018
Training Tips for the Transformer Model
Martin Popel
Ondrej Bojar
86
312
0
01 Apr 2018
Online Second Order Methods for Non-Convex Stochastic Optimizations
Xi-Lin Li
OffRL
ODL
41
4
0
26 Mar 2018
On the Local Minima of the Empirical Risk
Chi Jin
Lydia T. Liu
Rong Ge
Michael I. Jordan
FedML
147
56
0
25 Mar 2018
Multiple Sclerosis Lesion Segmentation from Brain MRI via Fully Convolutional Neural Networks
Snehashis Roy
J. Butman
Daniel Reich
P. Calabresi
Dzung L. Pham
MedIm
50
86
0
24 Mar 2018
A high-bias, low-variance introduction to Machine Learning for physicists
Pankaj Mehta
Marin Bukov
Ching-Hao Wang
A. G. Day
C. Richardson
Charles K. Fisher
D. Schwab
AI4CE
119
880
0
23 Mar 2018
Gradient Descent Quantizes ReLU Network Features
Hartmut Maennel
Olivier Bousquet
Sylvain Gelly
MLT
66
82
0
22 Mar 2018
Learning Eligibility in Cancer Clinical Trials using Deep Neural Networks
A. Bustos
A. Pertusa
25
28
0
22 Mar 2018
Assessing Shape Bias Property of Convolutional Neural Networks
Hossein Hosseini
Baicen Xiao
Mayoore S. Jaiswal
Radha Poovendran
63
36
0
21 Mar 2018
Comparing Dynamics: Deep Neural Networks versus Glassy Systems
Marco Baity-Jesi
Levent Sagun
Mario Geiger
S. Spigler
Gerard Ben Arous
C. Cammarota
Yann LeCun
Matthieu Wyart
Giulio Biroli
AI4CE
112
115
0
19 Mar 2018
On the importance of single directions for generalization
Ari S. Morcos
David Barrett
Neil C. Rabinowitz
M. Botvinick
97
333
0
19 Mar 2018
On the insufficiency of existing momentum schemes for Stochastic Optimization
Rahul Kidambi
Praneeth Netrapalli
Prateek Jain
Sham Kakade
ODL
90
120
0
15 Mar 2018
Averaging Weights Leads to Wider Optima and Better Generalization
Pavel Izmailov
Dmitrii Podoprikhin
T. Garipov
Dmitry Vetrov
A. Wilson
FedML
MoMe
149
1,673
0
14 Mar 2018
TicTac: Accelerating Distributed Deep Learning with Communication Scheduling
Sayed Hadi Hashemi
Sangeetha Abdu Jyothi
R. Campbell
51
199
0
08 Mar 2018
Essentially No Barriers in Neural Network Energy Landscape
Felix Dräxler
K. Veschgini
M. Salmhofer
Fred Hamprecht
MoMe
136
436
0
02 Mar 2018
The Anisotropic Noise in Stochastic Gradient Descent: Its Behavior of Escaping from Sharp Minima and Regularization Effects
Zhanxing Zhu
Jingfeng Wu
Ting Yu
Lei Wu
Jin Ma
65
40
0
01 Mar 2018
Neural Inverse Rendering for General Reflectance Photometric Stereo
Tatsunori Taniai
Takanori Maehara
130
105
0
28 Feb 2018
Semi-Supervised Learning Enabled by Multiscale Deep Neural Network Inversion
Randall Balestriero
H. Glotin
Richard Baraniuk
BDL
110
5
0
27 Feb 2018
Loss Surfaces, Mode Connectivity, and Fast Ensembling of DNNs
T. Garipov
Pavel Izmailov
Dmitrii Podoprikhin
Dmitry Vetrov
A. Wilson
UQCV
114
758
0
27 Feb 2018
Solving Inverse Computational Imaging Problems using Deep Pixel-level Prior
Akshat Dave
Anil Kumar Vadathya
R. Subramanyam
R. Baburajan
Kaushik Mitra
61
22
0
27 Feb 2018
A Walk with SGD
Chen Xing
Devansh Arpit
Christos Tsirigotis
Yoshua Bengio
96
119
0
24 Feb 2018
Sensitivity and Generalization in Neural Networks: an Empirical Study
Roman Novak
Yasaman Bahri
Daniel A. Abolafia
Jeffrey Pennington
Jascha Narain Sohl-Dickstein
AAML
99
442
0
23 Feb 2018
Characterizing Implicit Bias in Terms of Optimization Geometry
Suriya Gunasekar
Jason D. Lee
Daniel Soudry
Nathan Srebro
AI4CE
90
413
0
22 Feb 2018
Hessian-based Analysis of Large Batch Training and Robustness to Adversaries
Z. Yao
A. Gholami
Qi Lei
Kurt Keutzer
Michael W. Mahoney
88
167
0
22 Feb 2018
The Secret Sharer: Evaluating and Testing Unintended Memorization in Neural Networks
Nicholas Carlini
Chang-rui Liu
Ulfar Erlingsson
Jernej Kos
Basel Alomair
177
1,151
0
22 Feb 2018
Improved Techniques For Weakly-Supervised Object Localization
Junsuk Choe
J. Park
Hyunjung Shim
WSOL
62
7
0
22 Feb 2018
Previous
1
2
3
...
28
29
30
31
32
Next