ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1609.04836
  4. Cited By
On Large-Batch Training for Deep Learning: Generalization Gap and Sharp
  Minima
v1v2 (latest)

On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima

15 September 2016
N. Keskar
Dheevatsa Mudigere
J. Nocedal
M. Smelyanskiy
P. T. P. Tang
    ODL
ArXiv (abs)PDFHTML

Papers citing "On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima"

50 / 1,554 papers shown
Title
Accurate, Efficient and Scalable Graph Embedding
Accurate, Efficient and Scalable Graph Embedding
Hanqing Zeng
Hongkuan Zhou
Ajitesh Srivastava
Rajgopal Kannan
Viktor Prasanna
GNN
95
74
0
28 Oct 2018
Can We Gain More from Orthogonality Regularizations in Training Deep
  CNNs?
Can We Gain More from Orthogonality Regularizations in Training Deep CNNs?
Nitin Bansal
Xiaohan Chen
Zhangyang Wang
OOD
96
188
0
22 Oct 2018
A Modern Take on the Bias-Variance Tradeoff in Neural Networks
A Modern Take on the Bias-Variance Tradeoff in Neural Networks
Brady Neal
Sarthak Mittal
A. Baratin
Vinayak Tantia
Matthew Scicluna
Simon Lacoste-Julien
Ioannis Mitliagkas
87
167
0
19 Oct 2018
Sequenced-Replacement Sampling for Deep Learning
Sequenced-Replacement Sampling for Deep Learning
C. Ho
Dae Hoon Park
Wei Yang
Yi Chang
35
0
0
19 Oct 2018
The loss surface of deep linear networks viewed through the algebraic
  geometry lens
The loss surface of deep linear networks viewed through the algebraic geometry lens
D. Mehta
Tianran Chen
Tingting Tang
J. Hauenstein
ODL
92
32
0
17 Oct 2018
Approximate Fisher Information Matrix to Characterise the Training of
  Deep Neural Networks
Approximate Fisher Information Matrix to Characterise the Training of Deep Neural Networks
Zhibin Liao
Tom Drummond
Ian Reid
G. Carneiro
68
23
0
16 Oct 2018
Detecting Memorization in ReLU Networks
Detecting Memorization in ReLU Networks
Edo Collins
Siavash Bigdeli
Sabine Süsstrunk
73
4
0
08 Oct 2018
Toward Understanding the Impact of Staleness in Distributed Machine
  Learning
Toward Understanding the Impact of Staleness in Distributed Machine Learning
Wei-Ming Dai
Yi Zhou
Nanqing Dong
Huatian Zhang
Eric Xing
67
81
0
08 Oct 2018
Implicit Self-Regularization in Deep Neural Networks: Evidence from
  Random Matrix Theory and Implications for Learning
Implicit Self-Regularization in Deep Neural Networks: Evidence from Random Matrix Theory and Implications for Learning
Charles H. Martin
Michael W. Mahoney
AI4CE
134
201
0
02 Oct 2018
Large batch size training of neural networks with adversarial training
  and second-order information
Large batch size training of neural networks with adversarial training and second-order information
Z. Yao
A. Gholami
Daiyaan Arfeen
Richard Liaw
Joseph E. Gonzalez
Kurt Keutzer
Michael W. Mahoney
ODL
96
42
0
02 Oct 2018
Directional Analysis of Stochastic Gradient Descent via von Mises-Fisher
  Distributions in Deep learning
Directional Analysis of Stochastic Gradient Descent via von Mises-Fisher Distributions in Deep learning
Cheolhyoung Lee
Kyunghyun Cho
Wanmo Kang
56
8
0
29 Sep 2018
Interpreting Adversarial Robustness: A View from Decision Surface in
  Input Space
Interpreting Adversarial Robustness: A View from Decision Surface in Input Space
Fuxun Yu
Chenchen Liu
Yanzhi Wang
Liang Zhao
Xiang Chen
AAMLOOD
87
27
0
29 Sep 2018
GPyTorch: Blackbox Matrix-Matrix Gaussian Process Inference with GPU
  Acceleration
GPyTorch: Blackbox Matrix-Matrix Gaussian Process Inference with GPU Acceleration
Jacob R. Gardner
Geoff Pleiss
D. Bindel
Kilian Q. Weinberger
A. Wilson
GP
149
1,105
0
28 Sep 2018
A theoretical framework for deep locally connected ReLU network
A theoretical framework for deep locally connected ReLU network
Yuandong Tian
PINN
61
10
0
28 Sep 2018
Deep Confidence: A Computationally Efficient Framework for Calculating
  Reliable Errors for Deep Neural Networks
Deep Confidence: A Computationally Efficient Framework for Calculating Reliable Errors for Deep Neural Networks
I. Cortés-Ciriano
A. Bender
OODUQCV
67
61
0
24 Sep 2018
Identifying Generalization Properties in Neural Networks
Identifying Generalization Properties in Neural Networks
Huan Wang
N. Keskar
Caiming Xiong
R. Socher
66
50
0
19 Sep 2018
Efficient and Robust Parallel DNN Training through Model Parallelism on
  Multi-GPU Platform
Efficient and Robust Parallel DNN Training through Model Parallelism on Multi-GPU Platform
Chi-Chung Chen
Chia-Lin Yang
Hsiang-Yun Cheng
91
101
0
08 Sep 2018
Accelerating Asynchronous Stochastic Gradient Descent for Neural Machine
  Translation
Accelerating Asynchronous Stochastic Gradient Descent for Neural Machine Translation
Nikolay Bogoychev
Marcin Junczys-Dowmunt
Kenneth Heafield
Alham Fikri Aji
ODL
49
17
0
27 Aug 2018
Don't Use Large Mini-Batches, Use Local SGD
Don't Use Large Mini-Batches, Use Local SGD
Tao R. Lin
Sebastian U. Stich
Kumar Kshitij Patel
Martin Jaggi
121
432
0
22 Aug 2018
Understanding training and generalization in deep learning by Fourier
  analysis
Understanding training and generalization in deep learning by Fourier analysis
Zhi-Qin John Xu
AI4CE
93
94
0
13 Aug 2018
Fast Variance Reduction Method with Stochastic Batch Size
Fast Variance Reduction Method with Stochastic Batch Size
Xuanqing Liu
Cho-Jui Hsieh
91
5
0
07 Aug 2018
Learning Overparameterized Neural Networks via Stochastic Gradient
  Descent on Structured Data
Learning Overparameterized Neural Networks via Stochastic Gradient Descent on Structured Data
Yuanzhi Li
Yingyu Liang
MLT
226
653
0
03 Aug 2018
Generalization Error in Deep Learning
Generalization Error in Deep Learning
Daniel Jakubovitz
Raja Giryes
M. Rodrigues
AI4CE
226
111
0
03 Aug 2018
Highly Scalable Deep Learning Training System with Mixed-Precision:
  Training ImageNet in Four Minutes
Highly Scalable Deep Learning Training System with Mixed-Precision: Training ImageNet in Four Minutes
Xianyan Jia
Shutao Song
W. He
Yangzihao Wang
Haidong Rong
...
Li Yu
Tiegang Chen
Guangxiao Hu
Shaoshuai Shi
Xiaowen Chu
110
384
0
30 Jul 2018
Learning Representations for Soft Skill Matching
Learning Representations for Soft Skill Matching
L. Sayfullina
Eric Malmi
Arno Solin
40
31
0
20 Jul 2018
On the Relation Between the Sharpest Directions of DNN Loss and the SGD
  Step Length
On the Relation Between the Sharpest Directions of DNN Loss and the SGD Step Length
Stanislaw Jastrzebski
Zachary Kenton
Nicolas Ballas
Asja Fischer
Yoshua Bengio
Amos Storkey
ODL
71
118
0
13 Jul 2018
Efficient Decentralized Deep Learning by Dynamic Model Averaging
Efficient Decentralized Deep Learning by Dynamic Model Averaging
Michael Kamp
Linara Adilova
Joachim Sicking
Fabian Hüger
Peter Schlicht
Tim Wirtz
Stefan Wrobel
88
129
0
09 Jul 2018
The Goldilocks zone: Towards better understanding of neural network loss
  landscapes
The Goldilocks zone: Towards better understanding of neural network loss landscapes
Stanislav Fort
Adam Scherlis
71
50
0
06 Jul 2018
Fuzzy Logic Interpretation of Quadratic Networks
Fuzzy Logic Interpretation of Quadratic Networks
Fenglei Fan
Ge Wang
62
7
0
04 Jul 2018
Optimization of neural networks via finite-value quantum fluctuations
Optimization of neural networks via finite-value quantum fluctuations
Masayuki Ohzeki
Shuntaro Okada
Masayoshi Terabe
S. Taguchi
48
21
0
01 Jul 2018
Graph-to-Sequence Learning using Gated Graph Neural Networks
Graph-to-Sequence Learning using Gated Graph Neural Networks
Daniel Beck
Gholamreza Haffari
Trevor Cohn
GNN
79
326
0
26 Jun 2018
Stochastic natural gradient descent draws posterior samples in function
  space
Stochastic natural gradient descent draws posterior samples in function space
Samuel L. Smith
Daniel Duckworth
Semon Rezchikov
Quoc V. Le
Jascha Narain Sohl-Dickstein
BDL
69
6
0
25 Jun 2018
Pushing the boundaries of parallel Deep Learning -- A practical approach
Pushing the boundaries of parallel Deep Learning -- A practical approach
Paolo Viviani
M. Drocco
Marco Aldinucci
OOD
38
0
0
25 Jun 2018
Character-Level Feature Extraction with Densely Connected Networks
Character-Level Feature Extraction with Densely Connected Networks
Chanhee Lee
Young-Bum Kim
Dongyub Lee
Heuiseok Lim
3DV
41
12
0
24 Jun 2018
PCA of high dimensional random walks with comparison to neural network
  training
PCA of high dimensional random walks with comparison to neural network training
J. Antognini
Jascha Narain Sohl-Dickstein
OOD
62
29
0
22 Jun 2018
On the Spectral Bias of Neural Networks
On the Spectral Bias of Neural Networks
Nasim Rahaman
A. Baratin
Devansh Arpit
Felix Dräxler
Min Lin
Fred Hamprecht
Yoshua Bengio
Aaron Courville
170
1,460
0
22 Jun 2018
Faster SGD training by minibatch persistency
Faster SGD training by minibatch persistency
M. Fischetti
Iacopo Mandatelli
Domenico Salvagnin
41
5
0
19 Jun 2018
Using Mode Connectivity for Loss Landscape Analysis
Using Mode Connectivity for Loss Landscape Analysis
Akhilesh Deepak Gotmare
N. Keskar
Caiming Xiong
R. Socher
71
28
0
18 Jun 2018
Laplacian Smoothing Gradient Descent
Laplacian Smoothing Gradient Descent
Stanley Osher
Bao Wang
Penghang Yin
Xiyang Luo
Farzin Barekat
Minh Pham
A. Lin
ODL
113
43
0
17 Jun 2018
There Are Many Consistent Explanations of Unlabeled Data: Why You Should
  Average
There Are Many Consistent Explanations of Unlabeled Data: Why You Should Average
Ben Athiwaratkun
Marc Finzi
Pavel Izmailov
A. Wilson
281
244
0
14 Jun 2018
Knowledge Distillation by On-the-Fly Native Ensemble
Knowledge Distillation by On-the-Fly Native Ensemble
Xu Lan
Xiatian Zhu
S. Gong
298
481
0
12 Jun 2018
The Effect of Network Width on the Performance of Large-batch Training
The Effect of Network Width on the Performance of Large-batch Training
Lingjiao Chen
Hongyi Wang
Jinman Zhao
Dimitris Papailiopoulos
Paraschos Koutris
87
22
0
11 Jun 2018
Towards Binary-Valued Gates for Robust LSTM Training
Towards Binary-Valued Gates for Robust LSTM Training
Zhuohan Li
Di He
Fei Tian
Wei-neng Chen
Tao Qin
Liwei Wang
Tie-Yan Liu
MQ
59
47
0
08 Jun 2018
Training Faster by Separating Modes of Variation in Batch-normalized
  Models
Training Faster by Separating Modes of Variation in Batch-normalized Models
Mahdi M. Kalayeh
M. Shah
70
42
0
07 Jun 2018
Implicit regularization and solution uniqueness in over-parameterized
  matrix sensing
Implicit regularization and solution uniqueness in over-parameterized matrix sensing
Kelly Geyer
Anastasios Kyrillidis
A. Kalev
106
4
0
06 Jun 2018
Layer rotation: a surprisingly powerful indicator of generalization in
  deep networks?
Layer rotation: a surprisingly powerful indicator of generalization in deep networks?
Simon Carbonnelle
Christophe De Vleeschouwer
MLT
70
1
0
05 Jun 2018
Backdrop: Stochastic Backpropagation
Backdrop: Stochastic Backpropagation
Siavash Golkar
Kyle Cranmer
45
2
0
04 Jun 2018
Universal Statistics of Fisher Information in Deep Neural Networks: Mean
  Field Approach
Universal Statistics of Fisher Information in Deep Neural Networks: Mean Field Approach
Ryo Karakida
S. Akaho
S. Amari
FedML
191
146
0
04 Jun 2018
Implicit Bias of Gradient Descent on Linear Convolutional Networks
Implicit Bias of Gradient Descent on Linear Convolutional Networks
Suriya Gunasekar
Jason D. Lee
Daniel Soudry
Nathan Srebro
MDE
133
414
0
01 Jun 2018
Understanding Batch Normalization
Understanding Batch Normalization
Johan Bjorck
Carla P. Gomes
B. Selman
Kilian Q. Weinberger
177
617
0
01 Jun 2018
Previous
123...272829303132
Next