Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1609.04836
Cited By
v1
v2 (latest)
On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima
15 September 2016
N. Keskar
Dheevatsa Mudigere
J. Nocedal
M. Smelyanskiy
P. T. P. Tang
ODL
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima"
50 / 1,554 papers shown
Title
The Benefits of Implicit Regularization from SGD in Least Squares Problems
Difan Zou
Jingfeng Wu
Vladimir Braverman
Quanquan Gu
Dean Phillips Foster
Sham Kakade
62
31
0
10 Aug 2021
Online Evolutionary Batch Size Orchestration for Scheduling Deep Learning Workloads in GPU Clusters
Chen Sun
Shenggui Li
Jinyue Wang
Jun Yu
114
48
0
08 Aug 2021
Convergence of gradient descent for learning linear neural networks
Gabin Maxime Nguegnang
Holger Rauhut
Ulrich Terstiege
MLT
61
18
0
04 Aug 2021
Batch Normalization Preconditioning for Neural Network Training
Susanna Lange
Kyle E. Helfrich
Qiang Ye
64
9
0
02 Aug 2021
Finding Discriminative Filters for Specific Degradations in Blind Super-Resolution
Liangbin Xie
Xintao Wang
Chao Dong
Zhongang Qi
Ying Shan
57
39
0
02 Aug 2021
Taxonomizing local versus global structure in neural network loss landscapes
Yaoqing Yang
Liam Hodgkinson
Ryan Theisen
Joe Zou
Joseph E. Gonzalez
Kannan Ramchandran
Michael W. Mahoney
111
37
0
23 Jul 2021
Analytic Study of Families of Spurious Minima in Two-Layer ReLU Neural Networks: A Tale of Symmetry II
Yossi Arjevani
M. Field
70
19
0
21 Jul 2021
The Limiting Dynamics of SGD: Modified Loss, Phase Space Oscillations, and Anomalous Diffusion
D. Kunin
Javier Sagastuy-Breña
Lauren Gillespie
Eshed Margalit
Hidenori Tanaka
Surya Ganguli
Daniel L. K. Yamins
93
20
0
19 Jul 2021
Rethinking Graph Auto-Encoder Models for Attributed Graph Clustering
Nairouz Mrabah
Mohamed Bouguessa
M. Touati
Riadh Ksantini
99
64
0
19 Jul 2021
Point-Cloud Deep Learning of Porous Media for Permeability Prediction
Ali Kashefi
T. Mukerji
3DPC
AI4CE
63
35
0
18 Jul 2021
Globally Convergent Multilevel Training of Deep Residual Networks
Alena Kopanicáková
Rolf Krause
109
15
0
15 Jul 2021
AlterSGD: Finding Flat Minima for Continual Learning by Alternative Training
Zhongzhan Huang
Ming Liang
Senwei Liang
Wei He
CLL
ODL
71
6
0
13 Jul 2021
SGD: The Role of Implicit Regularization, Batch-size and Multiple-epochs
Satyen Kale
Ayush Sekhari
Karthik Sridharan
259
29
0
11 Jul 2021
The Bayesian Learning Rule
Mohammad Emtiyaz Khan
Håvard Rue
BDL
159
83
0
09 Jul 2021
Activated Gradients for Deep Neural Networks
Mei Liu
Liangming Chen
Xiaohao Du
Long Jin
Mingsheng Shang
ODL
AI4CE
72
144
0
09 Jul 2021
Simpler, Faster, Stronger: Breaking The log-K Curse On Contrastive Learners With FlatNCE
Junya Chen
Zhe Gan
Xuan Li
Qing Guo
Liqun Chen
...
Belinda Zeng
Wenlian Lu
Fan Li
Lawrence Carin
Chenyang Tao
96
28
0
02 Jul 2021
Revisiting Knowledge Distillation: An Inheritance and Exploration Framework
Zhen Huang
Xu Shen
Jun Xing
Tongliang Liu
Xinmei Tian
Houqiang Li
Bing Deng
Jianqiang Huang
Xiansheng Hua
61
28
0
01 Jul 2021
Analytic Insights into Structure and Rank of Neural Network Hessian Maps
Sidak Pal Singh
Gregor Bachmann
Thomas Hofmann
FAtt
101
37
0
30 Jun 2021
What can linear interpolation of neural network loss landscapes tell us?
Tiffany J. Vlaar
Jonathan Frankle
MoMe
72
28
0
30 Jun 2021
Never Go Full Batch (in Stochastic Convex Optimization)
I Zaghloul Amir
Y. Carmon
Tomer Koren
Roi Livni
78
14
0
29 Jun 2021
Implicit Gradient Alignment in Distributed and Federated Learning
Yatin Dandi
Luis Barba
Martin Jaggi
FedML
131
35
0
25 Jun 2021
HyperNP: Interactive Visual Exploration of Multidimensional Projection Hyperparameters
G. Appleby
M. Espadoto
Rui Chen
Sam Goree
A. Telea
Erik W. Anderson
Remco Chang
41
11
0
25 Jun 2021
Sparse Flows: Pruning Continuous-depth Models
Lucas Liebenwein
Ramin Hasani
Alexander Amini
Daniela Rus
116
17
0
24 Jun 2021
Minimum sharpness: Scale-invariant parameter-robustness of neural networks
Hikaru Ibayashi
Takuo Hamaguchi
Masaaki Imaizumi
64
5
0
23 Jun 2021
Dangers of Bayesian Model Averaging under Covariate Shift
Pavel Izmailov
Patrick K. Nicholson
Sanae Lotfi
A. Wilson
OOD
UQCV
BDL
151
46
0
22 Jun 2021
Rethinking Adam: A Twofold Exponential Moving Average Approach
Yizhou Wang
Yue Kang
Can Qin
Huan Wang
Yi Xu
Yulun Zhang
Y. Fu
ODL
58
7
0
22 Jun 2021
Open-set Label Noise Can Improve Robustness Against Inherent Label Noise
Hongxin Wei
Lue Tao
Renchunzi Xie
Bo An
NoLa
78
86
0
21 Jun 2021
Better Training using Weight-Constrained Stochastic Dynamics
Benedict Leimkuhler
Tiffany J. Vlaar
Timothée Pouchon
Amos Storkey
43
9
0
20 Jun 2021
Practical Assessment of Generalization Performance Robustness for Deep Networks via Contrastive Examples
Xuanyu Wu
Xuhong Li
Haoyi Xiong
Xiao Zhang
Siyu Huang
Dejing Dou
23
1
0
20 Jun 2021
Deep Learning Through the Lens of Example Difficulty
R. Baldock
Hartmut Maennel
Behnam Neyshabur
91
161
0
17 Jun 2021
Implicit Bias of SGD for Diagonal Linear Networks: a Provable Benefit of Stochasticity
Scott Pesme
Loucas Pillaud-Vivien
Nicolas Flammarion
80
108
0
17 Jun 2021
Robust Training in High Dimensions via Block Coordinate Geometric Median Descent
Anish Acharya
Abolfazl Hashemi
Prateek Jain
Sujay Sanghavi
Inderjit S. Dhillon
Ufuk Topcu
62
33
0
16 Jun 2021
Gradient Forward-Propagation for Large-Scale Temporal Video Modelling
Mateusz Malinowski
Dimitrios Vytiniotis
G. Swirszcz
Viorica Patraucean
João Carreira
65
8
0
15 Jun 2021
Economic Nowcasting with Long Short-Term Memory Artificial Neural Networks (LSTM)
D. Hopp
AI4TS
38
33
0
15 Jun 2021
On Large-Cohort Training for Federated Learning
Zachary B. Charles
Zachary Garrett
Zhouyuan Huo
Sergei Shmulyian
Virginia Smith
FedML
77
114
0
15 Jun 2021
NG+ : A Multi-Step Matrix-Product Natural Gradient Method for Deep Learning
Minghan Yang
Dong Xu
Qiwen Cui
Zaiwen Wen
Pengxiang Xu
48
4
0
14 Jun 2021
Label Noise SGD Provably Prefers Flat Global Minimizers
Alexandru Damian
Tengyu Ma
Jason D. Lee
NoLa
142
120
0
11 Jun 2021
The dilemma of quantum neural networks
Yan Qian
Xinbiao Wang
Yuxuan Du
Xingyao Wu
Dacheng Tao
59
31
0
09 Jun 2021
What training reveals about neural network complexity
Andreas Loukas
Marinos Poiitis
Stefanie Jegelka
60
11
0
08 Jun 2021
Correcting Momentum in Temporal Difference Learning
Emmanuel Bengio
Joelle Pineau
Doina Precup
64
10
0
07 Jun 2021
Regularization in ResNet with Stochastic Depth
Soufiane Hayou
Fadhel Ayed
60
10
0
06 Jun 2021
RDA: Robust Domain Adaptation via Fourier Adversarial Attacking
Jiaxing Huang
Dayan Guan
Aoran Xiao
Shijian Lu
AAML
113
77
0
05 Jun 2021
Solving hybrid machine learning tasks by traversing weight space geodesics
G. Raghavan
Matt Thomson
34
0
0
05 Jun 2021
Stochastic gradient descent with noise of machine learning type. Part II: Continuous time analysis
Stephan Wojtowytsch
88
34
0
04 Jun 2021
Improving Neural Network Robustness via Persistency of Excitation
Kaustubh Sridhar
O. Sokolsky
Insup Lee
James Weimer
AAML
81
20
0
03 Jun 2021
Optimization Variance: Exploring Generalization Properties of DNNs
Xiao Zhang
Dongrui Wu
Haoyi Xiong
Bo Dai
46
4
0
03 Jun 2021
When Vision Transformers Outperform ResNets without Pre-training or Strong Data Augmentations
Xiangning Chen
Cho-Jui Hsieh
Boqing Gong
ViT
108
330
0
03 Jun 2021
Post-mortem on a deep learning contest: a Simpson's paradox and the complementary roles of scale metrics versus shape metrics
Charles H. Martin
Michael W. Mahoney
75
20
0
01 Jun 2021
Concurrent Adversarial Learning for Large-Batch Training
Yong Liu
Xiangning Chen
Minhao Cheng
Cho-Jui Hsieh
Yang You
ODL
85
13
0
01 Jun 2021
A study on the plasticity of neural networks
Tudor Berariu
Wojciech M. Czarnecki
Soham De
J. Bornschein
Samuel L. Smith
Razvan Pascanu
Claudia Clopath
CLL
AI4CE
79
32
0
31 May 2021
Previous
1
2
3
...
16
17
18
...
30
31
32
Next