Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1609.04836
Cited By
v1
v2 (latest)
On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima
15 September 2016
N. Keskar
Dheevatsa Mudigere
J. Nocedal
M. Smelyanskiy
P. T. P. Tang
ODL
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima"
50 / 1,554 papers shown
Title
How do SGD hyperparameters in natural training affect adversarial robustness?
Sandesh Kamath
Amit Deshpande
K. Subrahmanyam
AAML
28
3
0
20 Jun 2020
What Do Neural Networks Learn When Trained With Random Labels?
Hartmut Maennel
Ibrahim Alabdulmohsin
Ilya O. Tolstikhin
R. Baldock
Olivier Bousquet
Sylvain Gelly
Daniel Keysers
FedML
165
90
0
18 Jun 2020
Constraint-Based Regularization of Neural Networks
Benedict Leimkuhler
Timothée Pouchon
Tiffany J. Vlaar
Amos Storkey
42
10
0
17 Jun 2020
Learning a functional control for high-frequency finance
Laura Leal
Mathieu Laurière
Charles-Albert Lehalle
AIFin
59
20
0
17 Jun 2020
Learning Rates as a Function of Batch Size: A Random Matrix Theory Approach to Neural Network Training
Diego Granziol
S. Zohren
Stephen J. Roberts
ODL
148
50
0
16 Jun 2020
Flatness is a False Friend
Diego Granziol
ODL
51
19
0
16 Jun 2020
Shape Matters: Understanding the Implicit Bias of the Noise Covariance
Jeff Z. HaoChen
Colin Wei
Jason D. Lee
Tengyu Ma
210
95
0
15 Jun 2020
Feature Space Saturation during Training
Mats L. Richter
Justin Shenk
Wolf Byttner
Anders Arpteg
Mikael Huss
FAtt
30
6
0
15 Jun 2020
The Limit of the Batch Size
Yang You
Yuhui Wang
Huan Zhang
Zhao-jie Zhang
J. Demmel
Cho-Jui Hsieh
121
15
0
15 Jun 2020
Spherical Motion Dynamics: Learning Dynamics of Neural Network with Normalization, Weight Decay, and SGD
Ruosi Wan
Zhanxing Zhu
Xiangyu Zhang
Jian Sun
78
11
0
15 Jun 2020
On the Loss Landscape of Adversarial Training: Identifying Challenges and How to Overcome Them
Chen Liu
Mathieu Salzmann
Tao R. Lin
Ryota Tomioka
Sabine Süsstrunk
AAML
134
82
0
15 Jun 2020
Entropic gradient descent algorithms and wide flat minima
Fabrizio Pittorino
Carlo Lucibello
Christoph Feinauer
Gabriele Perugini
Carlo Baldassi
Elizaveta Demyanenko
R. Zecchina
ODL
MLT
109
33
0
14 Jun 2020
MetaPerturb: Transferable Regularizer for Heterogeneous Tasks and Architectures
Jeongun Ryu
Jaewoong Shin
Haebeom Lee
Sung Ju Hwang
AAML
OOD
50
8
0
13 Jun 2020
CPR: Classifier-Projection Regularization for Continual Learning
Sungmin Cha
Hsiang Hsu
Taebaek Hwang
Flavio du Pin Calmon
Taesup Moon
CLL
75
77
0
12 Jun 2020
Understanding the Role of Training Regimes in Continual Learning
Seyed Iman Mirzadeh
Mehrdad Farajtabar
Razvan Pascanu
H. Ghasemzadeh
CLL
81
228
0
12 Jun 2020
STL-SGD: Speeding Up Local SGD with Stagewise Communication Period
Shuheng Shen
Yifei Cheng
Jingchang Liu
Linli Xu
LRM
70
7
0
11 Jun 2020
Sketchy Empirical Natural Gradient Methods for Deep Learning
Minghan Yang
Dong Xu
Zaiwen Wen
Mengyun Chen
Pengxiang Xu
37
13
0
10 Jun 2020
Extrapolation for Large-batch Training in Deep Learning
Tao R. Lin
Lingjing Kong
Sebastian U. Stich
Martin Jaggi
96
36
0
10 Jun 2020
Exploring the Vulnerability of Deep Neural Networks: A Study of Parameter Corruption
Xu Sun
Zhiyuan Zhang
Xuancheng Ren
Ruixuan Luo
Liangyou Li
68
39
0
10 Jun 2020
On the Effectiveness of Regularization Against Membership Inference Attacks
Yigitcan Kaya
Sanghyun Hong
Tudor Dumitras
87
28
0
09 Jun 2020
The Heavy-Tail Phenomenon in SGD
Mert Gurbuzbalaban
Umut Simsekli
Lingjiong Zhu
59
130
0
08 Jun 2020
Speedy Performance Estimation for Neural Architecture Search
Binxin Ru
Clare Lyle
Lisa Schut
M. Fil
Mark van der Wilk
Y. Gal
100
37
0
08 Jun 2020
Efficient AutoML Pipeline Search with Matrix and Tensor Factorization
Chengrun Yang
Jicong Fan
Ziyang Wu
Madeleine Udell
70
9
0
07 Jun 2020
Structure preserving deep learning
E. Celledoni
Matthias Joachim Ehrhardt
Christian Etmann
R. McLachlan
B. Owren
Carola-Bibiane Schönlieb
Ferdia Sherry
AI4CE
119
44
0
05 Jun 2020
Scaling Distributed Training with Adaptive Summation
Saeed Maleki
Madan Musuvathi
Todd Mytkowicz
Olli Saarikivi
Tianju Xu
Vadim Eksarevskiy
Jaliya Ekanayake
Emad Barsoum
16
9
0
04 Jun 2020
Sparse Perturbations for Improved Convergence in Stochastic Zeroth-Order Optimization
Mayumi Ohta
Nathaniel Berger
Artem Sokolov
Stefan Riezler
ODL
48
9
0
02 Jun 2020
A heterogeneous branch and multi-level classification network for person re-identification
Jiabao Wang
Yongqian Li
Yangshuo Zhang
Zhuang Miao
Rui Zhang
57
8
0
02 Jun 2020
Momentum-based variance-reduced proximal stochastic gradient method for composite nonconvex stochastic optimization
Yangyang Xu
Yibo Xu
69
25
0
31 May 2020
Inherent Noise in Gradient Based Methods
Arushi Gupta
14
0
0
26 May 2020
Reliability and Performance Assessment of Federated Learning on Clinical Benchmark Data
G. Lee
S. Shin
OOD
32
2
0
24 May 2020
Automated Copper Alloy Grain Size Evaluation Using a Deep-learning CNN
George S. Baggs
P. Guerrier
A. Loeb
Jason C. Jones
39
9
0
20 May 2020
Physics-informed Neural Networks for Solving Inverse Problems of Nonlinear Biot's Equations: Batch Training
T. Kadeethum
T. Jørgensen
H. Nick
PINN
AI4CE
147
20
0
18 May 2020
Learning Rate Annealing Can Provably Help Generalization, Even for Convex Problems
Preetum Nakkiran
MLT
64
21
0
15 May 2020
Machine Learning and Deep Learning methods for predictive modelling from Raman spectra in bioprocessing
S. Rozov
27
1
0
06 May 2020
An empirical comparison of deep-neural-network architectures for next activity prediction using context-enriched process event logs
Sven Weinzierl
Sandra Zilker
Jens Brunk
Kate Revoredo
A. Nguyen
Martin Matzner
Jörg Becker
Björn Eskofier
41
19
0
03 May 2020
Learning to Ask Screening Questions for Job Postings
Baoxu Shi
Shan Li
Jaewon Yang
Mustafa Emre Kazdagli
Qi He
56
17
0
30 Apr 2020
Pruning artificial neural networks: a way to find well-generalizing, high-entropy sharp minima
Enzo Tartaglione
Andrea Bragagnolo
Marco Grangetto
66
12
0
30 Apr 2020
The Impact of the Mini-batch Size on the Variance of Gradients in Stochastic Gradient Descent
Xin-Yao Qian
Diego Klabjan
ODL
72
36
0
27 Apr 2020
FlexSA: Flexible Systolic Array Architecture for Efficient Pruned DNN Model Training
Sangkug Lym
M. Erez
30
26
0
27 Apr 2020
LightPAFF: A Two-Stage Distillation Framework for Pre-training and Fine-tuning
Kaitao Song
Hao Sun
Xu Tan
Tao Qin
Jianfeng Lu
Hongzhi Liu
Tie-Yan Liu
71
27
0
27 Apr 2020
Masking as an Efficient Alternative to Finetuning for Pretrained Language Models
Mengjie Zhao
Tao R. Lin
Fei Mi
Martin Jaggi
Hinrich Schütze
77
121
0
26 Apr 2020
Generative Data Augmentation for Commonsense Reasoning
Yiben Yang
Chaitanya Malaviya
Jared Fernandez
Swabha Swayamdipta
Ronan Le Bras
Ji-ping Wang
Chandra Bhagavatula
Yejin Choi
Doug Downey
LRM
82
90
0
24 Apr 2020
Dark Experience for General Continual Learning: a Strong, Simple Baseline
Pietro Buzzega
Matteo Boschini
Angelo Porrello
Davide Abati
Simone Calderara
BDL
CLL
91
928
0
15 Apr 2020
On Learning Rates and Schrödinger Operators
Bin Shi
Weijie J. Su
Michael I. Jordan
90
61
0
15 Apr 2020
Stochastic batch size for adaptive regularization in deep network optimization
Kensuke Nakamura
Stefano Soatto
Byung-Woo Hong
ODL
49
6
0
14 Apr 2020
Adversarial Weight Perturbation Helps Robust Generalization
Dongxian Wu
Shutao Xia
Yisen Wang
OOD
AAML
60
17
0
13 Apr 2020
Applying Cyclical Learning Rate to Neural Machine Translation
Choon Meng Lee
Jianfeng Liu
Wei Peng
ODL
24
2
0
06 Apr 2020
Projection Pursuit Gaussian Process Regression
Gecheng Chen
Rui Tuo
GP
39
13
0
01 Apr 2020
Robust and On-the-fly Dataset Denoising for Image Classification
Jiaming Song
Lunjia Hu
Michael Auli
Yann N. Dauphin
Tengyu Ma
NoLa
OOD
87
13
0
24 Mar 2020
SAT: Improving Adversarial Training via Curriculum-Based Loss Smoothing
Chawin Sitawarin
S. Chakraborty
David Wagner
AAML
71
40
0
18 Mar 2020
Previous
1
2
3
...
21
22
23
...
30
31
32
Next