ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1609.04836
  4. Cited By
On Large-Batch Training for Deep Learning: Generalization Gap and Sharp
  Minima
v1v2 (latest)

On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima

15 September 2016
N. Keskar
Dheevatsa Mudigere
J. Nocedal
M. Smelyanskiy
P. T. P. Tang
    ODL
ArXiv (abs)PDFHTML

Papers citing "On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima"

50 / 1,653 papers shown
Can We Gain More from Orthogonality Regularizations in Training Deep
  CNNs?
Can We Gain More from Orthogonality Regularizations in Training Deep CNNs?
Nitin Bansal
Xiaohan Chen
Zinan Lin
OOD
229
191
0
22 Oct 2018
A Modern Take on the Bias-Variance Tradeoff in Neural Networks
A Modern Take on the Bias-Variance Tradeoff in Neural Networks
Brady Neal
Sarthak Mittal
A. Baratin
Vinayak Tantia
Matthew Scicluna
Damien Scieur
Alexia Jolicoeur-Martineau
262
179
0
19 Oct 2018
Sequenced-Replacement Sampling for Deep Learning
Sequenced-Replacement Sampling for Deep Learning
C. Ho
Dae Hoon Park
Wei Yang
Yi Chang
79
0
0
19 Oct 2018
The loss surface of deep linear networks viewed through the algebraic
  geometry lens
The loss surface of deep linear networks viewed through the algebraic geometry lens
D. Mehta
Tianran Chen
Tingting Tang
J. Hauenstein
ODL
234
35
0
17 Oct 2018
Approximate Fisher Information Matrix to Characterise the Training of
  Deep Neural Networks
Approximate Fisher Information Matrix to Characterise the Training of Deep Neural Networks
Zhibin Liao
Tom Drummond
Ian Reid
G. Carneiro
152
25
0
16 Oct 2018
Detecting Memorization in ReLU Networks
Detecting Memorization in ReLU Networks
Edo Collins
Siavash Bigdeli
Sabine Süsstrunk
166
4
0
08 Oct 2018
Toward Understanding the Impact of Staleness in Distributed Machine
  Learning
Toward Understanding the Impact of Staleness in Distributed Machine Learning
Wei-Ming Dai
Yi Zhou
Nanqing Dong
Huatian Zhang
Eric Xing
191
92
0
08 Oct 2018
Implicit Self-Regularization in Deep Neural Networks: Evidence from
  Random Matrix Theory and Implications for Learning
Implicit Self-Regularization in Deep Neural Networks: Evidence from Random Matrix Theory and Implications for Learning
Charles H. Martin
Michael W. Mahoney
AI4CE
384
242
0
02 Oct 2018
Large batch size training of neural networks with adversarial training
  and second-order information
Large batch size training of neural networks with adversarial training and second-order information
Z. Yao
A. Gholami
Daiyaan Arfeen
Richard Liaw
Alfons Kemper
Kurt Keutzer
Michael W. Mahoney
ODL
288
46
0
02 Oct 2018
Directional Analysis of Stochastic Gradient Descent via von Mises-Fisher
  Distributions in Deep learning
Directional Analysis of Stochastic Gradient Descent via von Mises-Fisher Distributions in Deep learning
Cheolhyoung Lee
Dong Wang
Wanmo Kang
124
8
0
29 Sep 2018
Interpreting Adversarial Robustness: A View from Decision Surface in
  Input Space
Interpreting Adversarial Robustness: A View from Decision Surface in Input Space
Fuxun Yu
Chenchen Liu
Yanzhi Wang
Bo Pan
Xiang Chen
AAMLOOD
319
29
0
29 Sep 2018
GPyTorch: Blackbox Matrix-Matrix Gaussian Process Inference with GPU
  Acceleration
GPyTorch: Blackbox Matrix-Matrix Gaussian Process Inference with GPU AccelerationNeural Information Processing Systems (NeurIPS), 2018
Jacob R. Gardner
Geoff Pleiss
D. Bindel
Kilian Q. Weinberger
A. Wilson
GP
812
1,314
0
28 Sep 2018
A theoretical framework for deep locally connected ReLU network
A theoretical framework for deep locally connected ReLU network
Yuandong Tian
PINN
109
10
0
28 Sep 2018
Deep Confidence: A Computationally Efficient Framework for Calculating
  Reliable Errors for Deep Neural Networks
Deep Confidence: A Computationally Efficient Framework for Calculating Reliable Errors for Deep Neural NetworksJournal of Chemical Information and Modeling (JCIM), 2018
I. Cortés-Ciriano
A. Bender
OODUQCV
182
62
0
24 Sep 2018
Identifying Generalization Properties in Neural Networks
Identifying Generalization Properties in Neural Networks
Huan Wang
N. Keskar
Caiming Xiong
R. Socher
152
50
0
19 Sep 2018
Efficient and Robust Parallel DNN Training through Model Parallelism on
  Multi-GPU Platform
Efficient and Robust Parallel DNN Training through Model Parallelism on Multi-GPU Platform
Chi-Chung Chen
Chia-Lin Yang
Hsiang-Yun Cheng
280
105
0
08 Sep 2018
Accelerating Asynchronous Stochastic Gradient Descent for Neural Machine
  Translation
Accelerating Asynchronous Stochastic Gradient Descent for Neural Machine Translation
Nikolay Bogoychev
Marcin Junczys-Dowmunt
Kenneth Heafield
Alham Fikri Aji
ODL
131
17
0
27 Aug 2018
Don't Use Large Mini-Batches, Use Local SGD
Don't Use Large Mini-Batches, Use Local SGD
Tao Lin
Sebastian U. Stich
Kumar Kshitij Patel
Martin Jaggi
766
457
0
22 Aug 2018
Understanding training and generalization in deep learning by Fourier
  analysis
Understanding training and generalization in deep learning by Fourier analysis
Zhi-Qin John Xu
AI4CE
220
107
0
13 Aug 2018
Fast Variance Reduction Method with Stochastic Batch Size
Fast Variance Reduction Method with Stochastic Batch Size
Xuanqing Liu
Cho-Jui Hsieh
244
6
0
07 Aug 2018
Learning Overparameterized Neural Networks via Stochastic Gradient
  Descent on Structured Data
Learning Overparameterized Neural Networks via Stochastic Gradient Descent on Structured Data
Yuanzhi Li
Yingyu Liang
MLT
554
672
0
03 Aug 2018
Generalization Error in Deep Learning
Generalization Error in Deep Learning
Daniel Jakubovitz
Raja Giryes
M. Rodrigues
AI4CE
473
127
0
03 Aug 2018
Highly Scalable Deep Learning Training System with Mixed-Precision:
  Training ImageNet in Four Minutes
Highly Scalable Deep Learning Training System with Mixed-Precision: Training ImageNet in Four Minutes
Chencan Wu
Shutao Song
W. He
Yangzihao Wang
Haidong Rong
...
Li Yu
Tiegang Chen
Guangxiao Hu
Shaoshuai Shi
Xiaowen Chu
223
417
0
30 Jul 2018
Learning Representations for Soft Skill Matching
Learning Representations for Soft Skill MatchingInternational Joint Conference on the Analysis of Images, Social Networks and Texts (AISNT), 2018
L. Sayfullina
Eric Malmi
Arno Solin
86
37
0
20 Jul 2018
On the Relation Between the Sharpest Directions of DNN Loss and the SGD
  Step Length
On the Relation Between the Sharpest Directions of DNN Loss and the SGD Step Length
Stanislaw Jastrzebski
Zachary Kenton
Nicolas Ballas
Asja Fischer
Yoshua Bengio
Amos Storkey
ODL
692
129
0
13 Jul 2018
Efficient Decentralized Deep Learning by Dynamic Model Averaging
Efficient Decentralized Deep Learning by Dynamic Model Averaging
Michael Kamp
Linara Adilova
Joachim Sicking
Fabian Hüger
Peter Schlicht
Tim Wirtz
Stefan Wrobel
293
128
0
09 Jul 2018
The Goldilocks zone: Towards better understanding of neural network loss
  landscapes
The Goldilocks zone: Towards better understanding of neural network loss landscapesAAAI Conference on Artificial Intelligence (AAAI), 2018
Stanislav Fort
Adam Scherlis
209
53
0
06 Jul 2018
Fuzzy Logic Interpretation of Quadratic Networks
Fuzzy Logic Interpretation of Quadratic Networks
Fenglei Fan
Ge Wang
223
7
0
04 Jul 2018
Optimization of neural networks via finite-value quantum fluctuations
Optimization of neural networks via finite-value quantum fluctuationsScientific Reports (Sci Rep), 2018
Masayuki Ohzeki
Shuntaro Okada
Masayoshi Terabe
S. Taguchi
121
23
0
01 Jul 2018
Graph-to-Sequence Learning using Gated Graph Neural Networks
Graph-to-Sequence Learning using Gated Graph Neural Networks
Daniel Beck
Gholamreza Haffari
Trevor Cohn
GNN
160
341
0
26 Jun 2018
Stochastic natural gradient descent draws posterior samples in function
  space
Stochastic natural gradient descent draws posterior samples in function space
Samuel L. Smith
Daniel Duckworth
Semon Rezchikov
Quoc V. Le
Jascha Narain Sohl-Dickstein
BDL
264
7
0
25 Jun 2018
Pushing the boundaries of parallel Deep Learning -- A practical approach
Pushing the boundaries of parallel Deep Learning -- A practical approach
Paolo Viviani
M. Drocco
Marco Aldinucci
OOD
119
1
0
25 Jun 2018
Character-Level Feature Extraction with Densely Connected Networks
Character-Level Feature Extraction with Densely Connected Networks
Chanhee Lee
Young-Bum Kim
Dongyub Lee
Heuiseok Lim
3DV
151
12
0
24 Jun 2018
PCA of high dimensional random walks with comparison to neural network
  training
PCA of high dimensional random walks with comparison to neural network training
J. Antognini
Jascha Narain Sohl-Dickstein
OOD
113
28
0
22 Jun 2018
On the Spectral Bias of Neural Networks
On the Spectral Bias of Neural Networks
Nasim Rahaman
A. Baratin
Devansh Arpit
Felix Dräxler
Min Lin
Fred Hamprecht
Yoshua Bengio
Aaron Courville
552
1,904
0
22 Jun 2018
Faster SGD training by minibatch persistency
Faster SGD training by minibatch persistency
M. Fischetti
Iacopo Mandatelli
Domenico Salvagnin
81
6
0
19 Jun 2018
Using Mode Connectivity for Loss Landscape Analysis
Using Mode Connectivity for Loss Landscape Analysis
Akhilesh Deepak Gotmare
N. Keskar
Caiming Xiong
R. Socher
174
29
0
18 Jun 2018
Laplacian Smoothing Gradient Descent
Laplacian Smoothing Gradient Descent
Stanley Osher
Bao Wang
Penghang Yin
Xiyang Luo
Farzin Barekat
Minh Pham
A. Lin
ODL
347
46
0
17 Jun 2018
There Are Many Consistent Explanations of Unlabeled Data: Why You Should
  Average
There Are Many Consistent Explanations of Unlabeled Data: Why You Should Average
Ben Athiwaratkun
Marc Finzi
Pavel Izmailov
A. Wilson
553
258
0
14 Jun 2018
Knowledge Distillation by On-the-Fly Native Ensemble
Knowledge Distillation by On-the-Fly Native Ensemble
Xu Lan
Xiatian Zhu
S. Gong
521
528
0
12 Jun 2018
The Effect of Network Width on the Performance of Large-batch Training
The Effect of Network Width on the Performance of Large-batch Training
Lingjiao Chen
Hongyi Wang
Jinman Zhao
Dimitris Papailiopoulos
Paraschos Koutris
221
22
0
11 Jun 2018
Towards Binary-Valued Gates for Robust LSTM Training
Towards Binary-Valued Gates for Robust LSTM Training
Zhuohan Li
Di He
Fei Tian
Wei-neng Chen
Tao Qin
Liwei Wang
Tie-Yan Liu
MQ
172
49
0
08 Jun 2018
Training Faster by Separating Modes of Variation in Batch-normalized
  Models
Training Faster by Separating Modes of Variation in Batch-normalized Models
Mahdi M. Kalayeh
M. Shah
129
46
0
07 Jun 2018
Implicit regularization and solution uniqueness in over-parameterized
  matrix sensing
Implicit regularization and solution uniqueness in over-parameterized matrix sensing
Kelly Geyer
Anastasios Kyrillidis
A. Kalev
262
4
0
06 Jun 2018
Layer rotation: a surprisingly powerful indicator of generalization in
  deep networks?
Layer rotation: a surprisingly powerful indicator of generalization in deep networks?
Simon Carbonnelle
Christophe De Vleeschouwer
MLT
252
1
0
05 Jun 2018
Backdrop: Stochastic Backpropagation
Backdrop: Stochastic Backpropagation
Siavash Golkar
Kyle Cranmer
142
2
0
04 Jun 2018
Universal Statistics of Fisher Information in Deep Neural Networks: Mean
  Field Approach
Universal Statistics of Fisher Information in Deep Neural Networks: Mean Field Approach
Ryo Karakida
S. Akaho
S. Amari
FedML
550
164
0
04 Jun 2018
Implicit Bias of Gradient Descent on Linear Convolutional Networks
Implicit Bias of Gradient Descent on Linear Convolutional Networks
Suriya Gunasekar
Jason D. Lee
Daniel Soudry
Nathan Srebro
MDE
468
444
0
01 Jun 2018
Understanding Batch Normalization
Understanding Batch Normalization
Johan Bjorck
Daniel Schwalbe-Koda
B. Selman
Kilian Q. Weinberger
613
711
0
01 Jun 2018
The Dynamics of Learning: A Random Matrix Approach
The Dynamics of Learning: A Random Matrix Approach
Zhenyu Liao
Romain Couillet
AI4CE
163
45
0
30 May 2018
Previous
123...293031323334
Next
Page 30 of 34
Pageof 34