Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1609.04836
Cited By
v1
v2 (latest)
On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima
15 September 2016
N. Keskar
Dheevatsa Mudigere
J. Nocedal
M. Smelyanskiy
P. T. P. Tang
ODL
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima"
50 / 1,554 papers shown
Title
Communication-efficient Decentralized Machine Learning over Heterogeneous Networks
Pan Zhou
Qian Lin
Dumitrel Loghin
Beng Chin Ooi
Yuncheng Wu
Hongfang Yu
80
37
0
12 Sep 2020
Achieving Adversarial Robustness via Sparsity
Shu-Fan Wang
Ningyi Liao
Liyao Xiang
Nanyang Ye
Quanshi Zhang
AAML
58
16
0
11 Sep 2020
Self-Adaptive Physics-Informed Neural Networks using a Soft Attention Mechanism
L. McClenny
U. Braga-Neto
PINN
96
464
0
07 Sep 2020
S-SGD: Symmetrical Stochastic Gradient Descent with Weight Noise Injection for Reaching Flat Minima
Wonyong Sung
Iksoo Choi
Jinhwan Park
Seokhyun Choi
Sungho Shin
ODL
58
7
0
05 Sep 2020
Estimating the Brittleness of AI: Safety Integrity Levels and the Need for Testing Out-Of-Distribution Performance
A. Lohn
51
13
0
02 Sep 2020
Extreme Memorization via Scale of Initialization
Harsh Mehta
Ashok Cutkosky
Behnam Neyshabur
60
20
0
31 Aug 2020
Predicting Training Time Without Training
Luca Zancato
Alessandro Achille
Avinash Ravichandran
Rahul Bhotika
Stefano Soatto
156
24
0
28 Aug 2020
Adversarially Robust Learning via Entropic Regularization
Gauri Jagatap
Ameya Joshi
A. B. Chowdhury
S. Garg
Chinmay Hegde
OOD
123
11
0
27 Aug 2020
Pollux: Co-adaptive Cluster Scheduling for Goodput-Optimized Deep Learning
Aurick Qiao
Sang Keun Choe
Suhas Jayaram Subramanya
Willie Neiswanger
Qirong Ho
Hao Zhang
G. Ganger
Eric Xing
VLM
77
183
0
27 Aug 2020
Traces of Class/Cross-Class Structure Pervade Deep Learning Spectra
Vardan Papyan
64
80
0
27 Aug 2020
What is being transferred in transfer learning?
Behnam Neyshabur
Hanie Sedghi
Chiyuan Zhang
141
530
0
26 Aug 2020
HydaLearn: Highly Dynamic Task Weighting for Multi-task Learning with Auxiliary Tasks
Sam Verboven
M. H. Chaudhary
Jeroen Berrevoets
Wouter Verbeke
52
7
0
26 Aug 2020
Noise-induced degeneration in online learning
Yuzuru Sato
Daiji Tsutsui
A. Fujiwara
42
2
0
24 Aug 2020
XNAP: Making LSTM-based Next Activity Predictions Explainable by Using LRP
Sven Weinzierl
Sandra Zilker
Jens Brunk
K. Revoredo
Martin Matzner
J. Becker
62
27
0
18 Aug 2020
Adversarial Concurrent Training: Optimizing Robustness and Accuracy Trade-off of Deep Neural Networks
Elahe Arani
F. Sarfraz
Bahram Zonooz
AAML
60
9
0
16 Aug 2020
Skyline: Interactive In-Editor Computational Performance Profiling for Deep Neural Network Training
Geoffrey X. Yu
Tovi Grossman
Gennady Pekhimenko
41
17
0
15 Aug 2020
BroadFace: Looking at Tens of Thousands of People at Once for Face Recognition
Y. Kim
Wonpyo Park
Jongju Shin
CVBM
139
51
0
15 Aug 2020
Optimizing Information Loss Towards Robust Neural Networks
Philip Sperl
Konstantin Böttinger
AAML
45
3
0
07 Aug 2020
Neural Complexity Measures
Yoonho Lee
Juho Lee
Sung Ju Hwang
Eunho Yang
Seungjin Choi
85
9
0
07 Aug 2020
Communication-Efficient and Distributed Learning Over Wireless Networks: Principles and Applications
Jihong Park
S. Samarakoon
Anis Elgabli
Joongheon Kim
M. Bennis
Seong-Lyun Kim
Mérouane Debbah
102
164
0
06 Aug 2020
Wasserstein-based Projections with Applications to Inverse Problems
Howard Heaton
Samy Wu Fung
A. Lin
Stanley Osher
W. Yin
58
3
0
05 Aug 2020
Analytic Characterization of the Hessian in Shallow ReLU Models: A Tale of Symmetry
Yossi Arjevani
M. Field
55
16
0
04 Aug 2020
MLR-SNet: Transferable LR Schedules for Heterogeneous Tasks
Jun Shu
Yanwen Zhu
Qian Zhao
Zongben Xu
Deyu Meng
75
7
0
29 Jul 2020
Stochastic Normalized Gradient Descent with Momentum for Large-Batch Training
Shen-Yi Zhao
Chang-Wei Shi
Yin-Peng Xie
Wu-Jun Li
ODL
80
10
0
28 Jul 2020
AutoClip: Adaptive Gradient Clipping for Source Separation Networks
Prem Seetharaman
Gordon Wichern
Bryan Pardo
Jonathan Le Roux
67
34
0
25 Jul 2020
Neural networks with late-phase weights
J. Oswald
Seijin Kobayashi
Alexander Meulemans
Christian Henning
Benjamin Grewe
João Sacramento
94
35
0
25 Jul 2020
The Case for Strong Scaling in Deep Learning: Training Large 3D CNNs with Hybrid Parallelism
Yosuke Oyama
N. Maruyama
Nikoli Dryden
Erin McCarthy
P. Harrington
J. Balewski
Satoshi Matsuoka
Peter Nugent
B. Van Essen
3DV
AI4CE
71
37
0
25 Jul 2020
Linear discriminant initialization for feed-forward neural networks
Marissa Masden
D. Sinha
FedML
40
3
0
24 Jul 2020
Deforming the Loss Surface
Liangming Chen
Long Jin
Xiujuan Du
Shuai Li
Mei Liu
ODL
21
0
0
24 Jul 2020
Randomized Automatic Differentiation
Deniz Oktay
N. McGreivy
Joshua Aduol
Alex Beatson
Ryan P. Adams
ODL
65
27
0
20 Jul 2020
On regularization of gradient descent, layer imbalance and flat minima
Boris Ginsburg
6
2
0
18 Jul 2020
Understanding Implicit Regularization in Over-Parameterized Single Index Model
Jianqing Fan
Zhuoran Yang
Mengxin Yu
81
18
0
16 Jul 2020
Data-driven effective model shows a liquid-like deep learning
Wenxuan Zou
Haiping Huang
58
2
0
16 Jul 2020
Explicit Regularisation in Gaussian Noise Injections
A. Camuto
M. Willetts
Umut Simsekli
Stephen J. Roberts
Chris Holmes
100
59
0
14 Jul 2020
Beyond Graph Neural Networks with Lifted Relational Neural Networks
Gustav Sourek
F. Železný
Ondrej Kuzelka
NAI
131
18
0
13 Jul 2020
Adaptive Periodic Averaging: A Practical Approach to Reducing Communication in Distributed Learning
Peng Jiang
G. Agrawal
54
5
0
13 Jul 2020
A Study of Gradient Variance in Deep Learning
Fartash Faghri
David Duvenaud
David J. Fleet
Jimmy Ba
FedML
ODL
59
27
0
09 Jul 2020
Distributed Training of Deep Learning Models: A Taxonomic Perspective
M. Langer
Zhen He
W. Rahayu
Yanbo Xue
70
78
0
08 Jul 2020
DS-Sync: Addressing Network Bottlenecks with Divide-and-Shuffle Synchronization for Distributed DNN Training
Weiyan Wang
Cengguang Zhang
Liu Yang
Kai Chen
Kun Tan
68
12
0
07 Jul 2020
Predicting Porosity, Permeability, and Tortuosity of Porous Media from Images by Deep Learning
K. Graczyk
M. Matyka
3DV
AI4CE
95
117
0
06 Jul 2020
Accelerating Nonconvex Learning via Replica Exchange Langevin Diffusion
Yi Chen
Jinglin Chen
Jing-rong Dong
Jian-wei Peng
Zhaoran Wang
82
33
0
04 Jul 2020
Variance reduction for Riemannian non-convex optimization with batch size adaptation
Andi Han
Junbin Gao
85
5
0
03 Jul 2020
The Global Landscape of Neural Networks: An Overview
Ruoyu Sun
Dawei Li
Shiyu Liang
Tian Ding
R. Srikant
84
88
0
02 Jul 2020
Lipschitzness Is All You Need To Tame Off-policy Generative Adversarial Imitation Learning
Lionel Blondé
Pablo Strasser
Alexandros Kalousis
90
22
0
28 Jun 2020
Is SGD a Bayesian sampler? Well, almost
Chris Mingard
Guillermo Valle Pérez
Joar Skalse
A. Louis
BDL
77
53
0
26 Jun 2020
On the Generalization Benefit of Noise in Stochastic Gradient Descent
Samuel L. Smith
Erich Elsen
Soham De
MLT
62
100
0
26 Jun 2020
Effective Elastic Scaling of Deep Learning Workloads
Vaibhav Saxena
K.R. Jayaram
Saurav Basu
Yogish Sabharwal
Ashish Verma
49
9
0
24 Jun 2020
Dynamic of Stochastic Gradient Descent with State-Dependent Noise
Qi Meng
Shiqi Gong
Wei Chen
Zhi-Ming Ma
Tie-Yan Liu
53
16
0
24 Jun 2020
Understanding Deep Architectures with Reasoning Layer
Xinshi Chen
Yufei Zhang
C. Reisinger
Le Song
AI4CE
127
7
0
24 Jun 2020
Exploiting Contextual Information with Deep Neural Networks
Ismail Elezi
50
3
0
21 Jun 2020
Previous
1
2
3
...
20
21
22
...
30
31
32
Next