ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1609.04836
  4. Cited By
On Large-Batch Training for Deep Learning: Generalization Gap and Sharp
  Minima
v1v2 (latest)

On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima

15 September 2016
N. Keskar
Dheevatsa Mudigere
J. Nocedal
M. Smelyanskiy
P. T. P. Tang
    ODL
ArXiv (abs)PDFHTML

Papers citing "On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima"

50 / 1,653 papers shown
An Investigation into Neural Net Optimization via Hessian Eigenvalue
  Density
An Investigation into Neural Net Optimization via Hessian Eigenvalue Density
Behrooz Ghorbani
Shankar Krishnan
Ying Xiao
ODL
365
375
0
29 Jan 2019
Variational Characterizations of Local Entropy and Heat Regularization
  in Deep Learning
Variational Characterizations of Local Entropy and Heat Regularization in Deep Learning
Nicolas García Trillos
Zachary T. Kaplan
D. Sanz-Alonso
ODL
125
3
0
29 Jan 2019
Quasi-Newton Methods for Machine Learning: Forget the Past, Just Sample
Quasi-Newton Methods for Machine Learning: Forget the Past, Just Sample
A. Berahas
Majid Jahani
Peter Richtárik
Martin Takávc
398
49
0
28 Jan 2019
Augment your batch: better training with larger batches
Augment your batch: better training with larger batches
Elad Hoffer
Tal Ben-Nun
Itay Hubara
Niv Giladi
Torsten Hoefler
Daniel Soudry
ODL
229
78
0
27 Jan 2019
Traditional and Heavy-Tailed Self Regularization in Neural Network
  Models
Traditional and Heavy-Tailed Self Regularization in Neural Network Models
Charles H. Martin
Michael W. Mahoney
299
146
0
24 Jan 2019
Large-Batch Training for LSTM and Beyond
Large-Batch Training for LSTM and Beyond
Yang You
Jonathan Hseu
Chris Ying
J. Demmel
Kurt Keutzer
Cho-Jui Hsieh
237
96
0
24 Jan 2019
Measurements of Three-Level Hierarchical Structure in the Outliers in
  the Spectrum of Deepnet Hessians
Measurements of Three-Level Hierarchical Structure in the Outliers in the Spectrum of Deepnet Hessians
Vardan Papyan
174
89
0
24 Jan 2019
Decoupled Greedy Learning of CNNs
Decoupled Greedy Learning of CNNs
Eugene Belilovsky
Michael Eickenberg
Edouard Oyallon
352
128
0
23 Jan 2019
Visualized Insights into the Optimization Landscape of Fully
  Convolutional Networks
Visualized Insights into the Optimization Landscape of Fully Convolutional Networks
Jianjie Lu
K. Tong
244
12
0
20 Jan 2019
Frequency Principle: Fourier Analysis Sheds Light on Deep Neural
  Networks
Frequency Principle: Fourier Analysis Sheds Light on Deep Neural Networks
Zhi-Qin John Xu
Yaoyu Zhang
Yaoyu Zhang
Yan Xiao
Zheng Ma
734
644
0
19 Jan 2019
Quasi-potential as an implicit regularizer for the loss function in the
  stochastic gradient descent
Quasi-potential as an implicit regularizer for the loss function in the stochastic gradient descent
Wenqing Hu
Zhanxing Zhu
Haoyi Xiong
Jun Huan
MLT
102
10
0
18 Jan 2019
A Tail-Index Analysis of Stochastic Gradient Noise in Deep Neural
  Networks
A Tail-Index Analysis of Stochastic Gradient Noise in Deep Neural Networks
Umut Simsekli
Levent Sagun
Mert Gurbuzbalaban
500
292
0
18 Jan 2019
Ensemble Feature for Person Re-Identification
Ensemble Feature for Person Re-Identification
Jiabao Wang
Yang Li
Zhuang Miao
OOD3DPC
239
1
0
17 Jan 2019
Normalized Flat Minima: Exploring Scale Invariant Definition of Flat
  Minima for Neural Networks using PAC-Bayesian Analysis
Normalized Flat Minima: Exploring Scale Invariant Definition of Flat Minima for Neural Networks using PAC-Bayesian Analysis
Yusuke Tsuzuku
Issei Sato
Masashi Sugiyama
264
86
0
15 Jan 2019
Neumann Networks for Inverse Problems in Imaging
Neumann Networks for Inverse Problems in Imaging
Davis Gilton
Greg Ongie
Rebecca Willett
190
25
0
13 Jan 2019
Visualising Basins of Attraction for the Cross-Entropy and the Squared
  Error Neural Network Loss Functions
Visualising Basins of Attraction for the Cross-Entropy and the Squared Error Neural Network Loss Functions
Anna Sergeevna Bosman
A. Engelbrecht
Mardé Helbig
165
81
0
08 Jan 2019
CROSSBOW: Scaling Deep Learning with Small Batch Sizes on Multi-GPU
  Servers
CROSSBOW: Scaling Deep Learning with Small Batch Sizes on Multi-GPU Servers
A. Koliousis
Pijika Watcharapichat
Matthias Weidlich
Kai Zou
Paolo Costa
Peter R. Pietzuch
218
71
0
08 Jan 2019
Generalization in Deep Networks: The Role of Distance from
  Initialization
Generalization in Deep Networks: The Role of Distance from Initialization
Vaishnavh Nagarajan
J. Zico Kolter
ODL
200
97
0
07 Jan 2019
Federated Learning via Over-the-Air Computation
Federated Learning via Over-the-Air Computation
Kai Yang
Tao Jiang
Yuanming Shi
Z. Ding
FedML
352
1,009
0
31 Dec 2018
A continuous-time analysis of distributed stochastic gradient
A continuous-time analysis of distributed stochastic gradient
Nicholas M. Boffi
Jean-Jacques E. Slotine
274
16
0
28 Dec 2018
Improving Generalization of Deep Neural Networks by Leveraging Margin
  Distribution
Improving Generalization of Deep Neural Networks by Leveraging Margin Distribution
Shen-Huan Lyu
Lu Wang
Zhi Zhou
201
13
0
27 Dec 2018
Overparameterized Nonlinear Learning: Gradient Descent Takes the
  Shortest Path?
Overparameterized Nonlinear Learning: Gradient Descent Takes the Shortest Path?
Samet Oymak
Mahdi Soltanolkotabi
ODL
287
187
0
25 Dec 2018
Trust Region Based Adversarial Attack on Neural Networks
Trust Region Based Adversarial Attack on Neural Networks
Z. Yao
A. Gholami
Peng Xu
Kurt Keutzer
Michael W. Mahoney
AAML
117
59
0
16 Dec 2018
An Empirical Model of Large-Batch Training
An Empirical Model of Large-Batch Training
Sam McCandlish
Jared Kaplan
Dario Amodei
OpenAI Dota Team
901
356
0
14 Dec 2018
An Empirical Study of Example Forgetting during Deep Neural Network
  Learning
An Empirical Study of Example Forgetting during Deep Neural Network Learning
Mariya Toneva
Alessandro Sordoni
Rémi Tachet des Combes
Adam Trischler
Yoshua Bengio
Geoffrey J. Gordon
720
876
0
12 Dec 2018
Nonlinear Conjugate Gradients For Scaling Synchronous Distributed DNN
  Training
Nonlinear Conjugate Gradients For Scaling Synchronous Distributed DNN Training
Saurabh N. Adya
Vinay Palakkode
Oncel Tuzel
106
4
0
07 Dec 2018
Towards Theoretical Understanding of Large Batch Training in Stochastic
  Gradient Descent
Towards Theoretical Understanding of Large Batch Training in Stochastic Gradient Descent
Xiaowu Dai
Yuhua Zhu
142
12
0
03 Dec 2018
Stochastic Training of Residual Networks: a Differential Equation
  Viewpoint
Stochastic Training of Residual Networks: a Differential Equation Viewpoint
Qi Sun
Yunzhe Tao
Q. Du
169
26
0
01 Dec 2018
On the Computational Inefficiency of Large Batch Sizes for Stochastic
  Gradient Descent
On the Computational Inefficiency of Large Batch Sizes for Stochastic Gradient Descent
Noah Golmant
N. Vemuri
Z. Yao
Vladimir Feinberg
A. Gholami
Kai Rothauge
Michael W. Mahoney
Joseph E. Gonzalez
198
77
0
30 Nov 2018
On Implicit Filter Level Sparsity in Convolutional Neural Networks
On Implicit Filter Level Sparsity in Convolutional Neural Networks
Dushyant Mehta
K. Kim
Christian Theobalt
179
29
0
29 Nov 2018
3D human pose estimation in video with temporal convolutions and
  semi-supervised training
3D human pose estimation in video with temporal convolutions and semi-supervised training
Dario Pavllo
Christoph Feichtenhofer
David Grangier
Michael Auli
3DH
317
1,162
0
28 Nov 2018
Understanding the impact of entropy on policy optimization
Understanding the impact of entropy on policy optimization
Zafarali Ahmed
Nicolas Le Roux
Mohammad Norouzi
Dale Schuurmans
267
287
0
27 Nov 2018
Dense xUnit Networks
Dense xUnit Networks
I. Kligvasser
T. Michaeli
192
3
0
27 Nov 2018
Forward Stability of ResNet and Its Variants
Forward Stability of ResNet and Its VariantsJournal of Mathematical Imaging and Vision (JMIV), 2018
Linan Zhang
Hayden Schaeffer
200
53
0
24 Nov 2018
Self-Referenced Deep Learning
Self-Referenced Deep LearningAsian Conference on Computer Vision (ACCV), 2018
Xu Lan
Xiatian Zhu
S. Gong
262
24
0
19 Nov 2018
Generalizable Adversarial Training via Spectral Normalization
Generalizable Adversarial Training via Spectral NormalizationInternational Conference on Learning Representations (ICLR), 2018
Farzan Farnia
Jesse M. Zhang
David Tse
OODAAML
171
148
0
19 Nov 2018
Image Classification at Supercomputer Scale
Image Classification at Supercomputer Scale
Chris Ying
Sameer Kumar
Dehao Chen
Tao Wang
Youlong Cheng
VLM
187
126
0
16 Nov 2018
Massively Distributed SGD: ImageNet/ResNet-50 Training in a Flash
Massively Distributed SGD: ImageNet/ResNet-50 Training in a Flash
Hiroaki Mikami
Hisahiro Suganuma
Pongsakorn U-chupala
Yoshiki Tanaka
Yuichi Kageyama
193
79
0
13 Nov 2018
Measuring the Effects of Data Parallelism on Neural Network Training
Measuring the Effects of Data Parallelism on Neural Network TrainingJournal of machine learning research (JMLR), 2018
Christopher J. Shallue
Jaehoon Lee
J. Antognini
J. Mamou
J. Ketterling
Yao Wang
569
452
0
08 Nov 2018
Bias and Generalization in Deep Generative Models: An Empirical Study
Bias and Generalization in Deep Generative Models: An Empirical StudyNeural Information Processing Systems (NeurIPS), 2018
Shengjia Zhao
Hongyu Ren
Arianna Yuan
Jiaming Song
Noah D. Goodman
Stefano Ermon
AI4CE
221
148
0
08 Nov 2018
Characterizing Well-Behaved vs. Pathological Deep Neural Networks
Characterizing Well-Behaved vs. Pathological Deep Neural Networks
Mitchell Stern
199
0
0
07 Nov 2018
A Closer Look at Deep Policy Gradients
A Closer Look at Deep Policy Gradients
Andrew Ilyas
Logan Engstrom
Shibani Santurkar
Dimitris Tsipras
Firdaus Janoos
Larry Rudolph
Aleksander Madry
251
54
0
06 Nov 2018
Nonlinear Collaborative Scheme for Deep Neural Networks
Nonlinear Collaborative Scheme for Deep Neural Networks
Hui-Ling Zhen
Xi Lin
Alan Tang
Zhenhua Li
Qingfu Zhang
Sam Kwong
150
4
0
04 Nov 2018
Classification of Findings with Localized Lesions in Fundoscopic Images
  using a Regionally Guided CNN
Classification of Findings with Localized Lesions in Fundoscopic Images using a Regionally Guided CNN
Jaemin Son
Woong Bae
Sangkeun Kim
S. Park
Kyu-Hwan Jung
63
17
0
02 Nov 2018
Online Embedding Compression for Text Classification using Low Rank
  Matrix Factorization
Online Embedding Compression for Text Classification using Low Rank Matrix Factorization
Anish Acharya
Rahul Goel
A. Metallinou
Inderjit Dhillon
205
65
0
01 Nov 2018
Multi-Label Robust Factorization Autoencoder and its Application in
  Predicting Drug-Drug Interactions
Multi-Label Robust Factorization Autoencoder and its Application in Predicting Drug-Drug Interactions
Xu Chu
Yang Lin
Jingyue Gao
Jiangtao Wang
Yasha Wang
Leye Wang
OOD
67
4
0
01 Nov 2018
Democratizing Production-Scale Distributed Deep Learning
Democratizing Production-Scale Distributed Deep Learning
Minghuang Ma
Hadi Pouransari
Daniel Chao
Saurabh N. Adya
S. Serrano
Yi Qin
Dan Gimnicher
Dominic Walsh
MoE
334
6
0
31 Oct 2018
A Closer Look at Deep Learning Heuristics: Learning rate restarts,
  Warmup and Distillation
A Closer Look at Deep Learning Heuristics: Learning rate restarts, Warmup and Distillation
Akhilesh Deepak Gotmare
N. Keskar
Caiming Xiong
R. Socher
ODL
257
304
0
29 Oct 2018
Three Mechanisms of Weight Decay Regularization
Three Mechanisms of Weight Decay Regularization
Guodong Zhang
Simon Mahns
Bowen Xu
Roger C. Grosse
207
278
0
29 Oct 2018
Accurate, Efficient and Scalable Graph Embedding
Accurate, Efficient and Scalable Graph Embedding
Hanqing Zeng
Hongkuan Zhou
Ajitesh Srivastava
Rajgopal Kannan
Viktor Prasanna
GNN
287
81
0
28 Oct 2018
Previous
123...282930...323334
Next
Page 29 of 34
Pageof 34