Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1609.04836
Cited By
v1
v2 (latest)
On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima
15 September 2016
N. Keskar
Dheevatsa Mudigere
J. Nocedal
M. Smelyanskiy
P. T. P. Tang
ODL
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima"
50 / 1,554 papers shown
Title
Smoothness Analysis of Adversarial Training
Sekitoshi Kanai
Masanori Yamada
Hiroshi Takahashi
Yuki Yamanaka
Yasutoshi Ida
AAML
95
6
0
02 Mar 2021
Acceleration via Fractal Learning Rate Schedules
Naman Agarwal
Surbhi Goel
Cyril Zhang
76
18
0
01 Mar 2021
Siamese Labels Auxiliary Learning
Wenrui Gan
Zhulin Liu
Chong Chen
Tong Zhang
35
2
0
27 Feb 2021
Gradient Descent on Neural Networks Typically Occurs at the Edge of Stability
Jeremy M. Cohen
Simran Kaur
Yuanzhi Li
J. Zico Kolter
Ameet Talwalkar
ODL
131
279
0
26 Feb 2021
Loss Surface Simplexes for Mode Connecting Volumes and Fast Ensembling
Gregory W. Benton
Wesley J. Maddox
Sanae Lotfi
A. Wilson
UQCV
126
70
0
25 Feb 2021
On the Validity of Modeling SGD with Stochastic Differential Equations (SDEs)
Zhiyuan Li
Sadhika Malladi
Sanjeev Arora
104
80
0
24 Feb 2021
Noisy Gradient Descent Converges to Flat Minima for Nonconvex Matrix Factorization
Tianyi Liu
Yan Li
S. Wei
Enlu Zhou
T. Zhao
65
13
0
24 Feb 2021
Inductive Bias of Multi-Channel Linear Convolutional Networks with Bounded Weight Norm
Meena Jagadeesan
Ilya P. Razenshteyn
Suriya Gunasekar
113
21
0
24 Feb 2021
The Promises and Pitfalls of Deep Kernel Learning
Sebastian W. Ober
C. Rasmussen
Mark van der Wilk
UQCV
BDL
82
109
0
24 Feb 2021
ASAM: Adaptive Sharpness-Aware Minimization for Scale-Invariant Learning of Deep Neural Networks
Jungmin Kwon
Jeongseop Kim
Hyunseong Park
I. Choi
124
291
0
23 Feb 2021
The Uncanny Similarity of Recurrence and Depth
Avi Schwarzschild
Arjun Gupta
Amin Ghiasi
Micah Goldblum
Tom Goldstein
83
10
0
22 Feb 2021
Multi-View Feature Representation for Dialogue Generation with Bidirectional Distillation
Shaoxiong Feng
Xuancheng Ren
Kan Li
Xu Sun
62
11
0
22 Feb 2021
Non-Convex Optimization with Spectral Radius Regularization
Adam Sandler
Diego Klabjan
Yuan Luo
ODL
45
1
0
22 Feb 2021
Formal Language Theory Meets Modern NLP
William Merrill
AI4CE
NAI
112
13
0
19 Feb 2021
Consistent Lock-free Parallel Stochastic Gradient Descent for Fast and Stable Convergence
Karl Bäckström
Ivan Walulya
Marina Papatriantafilou
P. Tsigas
67
5
0
17 Feb 2021
SWAD: Domain Generalization by Seeking Flat Minima
Junbum Cha
Sanghyuk Chun
Kyungjae Lee
Han-Cheol Cho
Seunghyun Park
Yunsung Lee
Sungrae Park
MoMe
311
460
0
17 Feb 2021
Generating Structured Adversarial Attacks Using Frank-Wolfe Method
Ehsan Kazemi
Thomas Kerdreux
Liquang Wang
AAML
DiffM
48
1
0
15 Feb 2021
Cockpit: A Practical Debugging Tool for the Training of Deep Neural Networks
Frank Schneider
Felix Dangel
Philipp Hennig
74
10
0
12 Feb 2021
Noisy Recurrent Neural Networks
Soon Hoe Lim
N. Benjamin Erichson
Liam Hodgkinson
Michael W. Mahoney
93
54
0
09 Feb 2021
Consensus Control for Decentralized Deep Learning
Lingjing Kong
Tao R. Lin
Anastasia Koloskova
Martin Jaggi
Sebastian U. Stich
53
79
0
09 Feb 2021
SGD in the Large: Average-case Analysis, Asymptotics, and Stepsize Criticality
Courtney Paquette
Kiwon Lee
Fabian Pedregosa
Elliot Paquette
59
35
0
08 Feb 2021
Eliminating Sharp Minima from SGD with Truncated Heavy-tailed Noise
Xingyu Wang
Sewoong Oh
C. Rhee
75
17
0
08 Feb 2021
Adversarial Training Makes Weight Loss Landscape Sharper in Logistic Regression
Masanori Yamada
Sekitoshi Kanai
Tomoharu Iwata
Tomokatsu Takahashi
Yuki Yamanaka
Hiroshi Takahashi
Atsutoshi Kumagai
AAML
124
9
0
05 Feb 2021
Horizontally Fused Training Array: An Effective Hardware Utilization Squeezer for Training Novel Deep Learning Models
Shang Wang
Peiming Yang
Yuxuan Zheng
Xuelong Li
Gennady Pekhimenko
82
22
0
03 Feb 2021
Information-Theoretic Generalization Bounds for Stochastic Gradient Descent
Gergely Neu
Gintare Karolina Dziugaite
Mahdi Haghifam
Daniel M. Roy
128
90
0
01 Feb 2021
Exploring the Geometry and Topology of Neural Network Loss Landscapes
Stefan Horoi
Je-chun Huang
Bastian Rieck
Guillaume Lajoie
Guy Wolf
Smita Krishnaswamy
45
13
0
31 Jan 2021
Modelling Sovereign Credit Ratings: Evaluating the Accuracy and Driving Factors using Machine Learning Techniques
B. Overes
Michel van der Wel
17
6
0
29 Jan 2021
On the Origin of Implicit Regularization in Stochastic Gradient Descent
Samuel L. Smith
Benoit Dherin
David Barrett
Soham De
MLT
62
204
0
28 Jan 2021
cGANs for Cartoon to Real-life Images
P. Rajput
Kanya Satis
Sonnya Dellarosa
Wenxuan Huang
Obinna Agba
GAN
55
2
0
24 Jan 2021
Predicting the Mechanical Properties of Biopolymer Gels Using Neural Networks Trained on Discrete Fiber Network Data
Yue Leng
Vahidullah Tac
S. Calve
A. B. Tepole
86
32
0
23 Jan 2021
A shallow neural model for relation prediction
Caglar Demir
Diego Moussallem
A. N. Ngomo
45
11
0
22 Jan 2021
Robustness to Augmentations as a Generalization metric
Sumukh K Aithal
D. Kashyap
Natarajan Subramanyam
OOD
36
18
0
16 Jan 2021
BN-invariant sharpness regularizes the training model to better generalization
Mingyang Yi
Huishuai Zhang
Wei Chen
Zhi-Ming Ma
Tie-Yan Liu
128
3
0
08 Jan 2021
Accelerating Training of Batch Normalization: A Manifold Perspective
Mingyang Yi
24
3
0
08 Jan 2021
A spin-glass model for the loss surfaces of generative adversarial networks
Nicholas P. Baskerville
J. Keating
F. Mezzadri
J. Najnudel
GAN
88
12
0
07 Jan 2021
Topological obstructions in neural networks learning
S. Barannikov
Daria Voronkova
I. Trofimov
Alexander Korotin
Grigorii Sotnikov
Evgeny Burnaev
39
6
0
31 Dec 2020
Optimizing Deeper Transformers on Small Datasets
Peng Xu
Dhruv Kumar
Wei Yang
Wenjie Zi
Keyi Tang
Chenyang Huang
Jackie C.K. Cheung
S. Prince
Yanshuai Cao
AI4CE
109
69
0
30 Dec 2020
Crossover-SGD: A gossip-based communication in distributed deep learning for alleviating large mini-batch problem and enhancing scalability
Sangho Yeo
Minho Bae
Minjoong Jeong
Oh-Kyoung Kwon
Sangyoon Oh
57
3
0
30 Dec 2020
Mathematical Models of Overparameterized Neural Networks
Cong Fang
Hanze Dong
Tong Zhang
181
23
0
27 Dec 2020
Understanding Decoupled and Early Weight Decay
Johan Bjorck
Kilian Q. Weinberger
Carla P. Gomes
61
25
0
27 Dec 2020
Recent advances in deep learning theory
Fengxiang He
Dacheng Tao
AI4CE
130
51
0
20 Dec 2020
Combating Mode Collapse in GAN training: An Empirical Analysis using Hessian Eigenvalues
Ricard Durall
Avraam Chatzimichailidis
P. Labus
J. Keuper
GAN
77
62
0
17 Dec 2020
Study on the Large Batch Size Training of Neural Networks Based on the Second Order Gradient
Fengli Gao
Huicai Zhong
ODL
35
10
0
16 Dec 2020
DeepLesionBrain: Towards a broader deep-learning generalization for multiple sclerosis lesion segmentation
R. A. Kamraoui
Vinh-Thong Ta
T. Tourdias
Boris Mansencal
J. V. Manjón
Pierrick Coupé
OOD
120
54
0
14 Dec 2020
Warm Starting CMA-ES for Hyperparameter Optimization
Masahiro Nomura
Shuhei Watanabe
Youhei Akimoto
Yoshihiko Ozaki
Masaki Onishi
93
43
0
13 Dec 2020
Enhance Convolutional Neural Networks with Noise Incentive Block
Menghan Xia
Yi Wang
Chu Han
T. Wong
40
1
0
09 Dec 2020
Generalization bounds for deep learning
Guillermo Valle Pérez
A. Louis
BDL
82
45
0
07 Dec 2020
A Deeper Look at the Hessian Eigenspectrum of Deep Neural Networks and its Applications to Regularization
Adepu Ravi Sankar
Yash Khasbage
Rahul Vigneswaran
V. Balasubramanian
89
44
0
07 Dec 2020
Noise and Fluctuation of Finite Learning Rate Stochastic Gradient Descent
Kangqiao Liu
Liu Ziyin
Masakuni Ueda
MLT
149
39
0
07 Dec 2020
Why Unsupervised Deep Networks Generalize
Anita de Mello Koch
E. Koch
R. Koch
OOD
44
8
0
07 Dec 2020
Previous
1
2
3
...
18
19
20
...
30
31
32
Next