ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1609.04836
  4. Cited By
On Large-Batch Training for Deep Learning: Generalization Gap and Sharp
  Minima
v1v2 (latest)

On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima

15 September 2016
N. Keskar
Dheevatsa Mudigere
J. Nocedal
M. Smelyanskiy
P. T. P. Tang
    ODL
ArXiv (abs)PDFHTML

Papers citing "On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima"

50 / 1,554 papers shown
Title
Online Knowledge Distillation with Diverse Peers
Online Knowledge Distillation with Diverse Peers
Defang Chen
Jian-Ping Mei
Can Wang
Yan Feng
Chun-Yen Chen
FedML
87
302
0
01 Dec 2019
A Reparameterization-Invariant Flatness Measure for Deep Neural Networks
A Reparameterization-Invariant Flatness Measure for Deep Neural Networks
Henning Petzka
Linara Adilova
Michael Kamp
C. Sminchisescu
ODL
55
8
0
29 Nov 2019
On the Heavy-Tailed Theory of Stochastic Gradient Descent for Deep
  Neural Networks
On the Heavy-Tailed Theory of Stochastic Gradient Descent for Deep Neural Networks
Umut Simsekli
Mert Gurbuzbalaban
T. H. Nguyen
G. Richard
Levent Sagun
88
59
0
29 Nov 2019
Auto-Precision Scaling for Distributed Deep Learning
Auto-Precision Scaling for Distributed Deep Learning
Ruobing Han
J. Demmel
Yang You
43
5
0
20 Nov 2019
Information-Theoretic Local Minima Characterization and Regularization
Information-Theoretic Local Minima Characterization and Regularization
Zhiwei Jia
Hao Su
73
19
0
19 Nov 2019
Signed Input Regularization
Signed Input Regularization
Saeid Asgari Taghanaki
Kumar Abhishek
Ghassan Hamarneh
AAML
43
1
0
16 Nov 2019
Information-Theoretic Perspective of Federated Learning
Information-Theoretic Perspective of Federated Learning
Linara Adilova
Julia Rosenzweig
Michael Kamp
FedML
13
4
0
15 Nov 2019
Optimal Mini-Batch Size Selection for Fast Gradient Descent
Optimal Mini-Batch Size Selection for Fast Gradient Descent
M. Perrone
Haidar Khan
Changhoan Kim
Anastasios Kyrillidis
Jerry Quinn
V. Salapura
38
9
0
15 Nov 2019
MindTheStep-AsyncPSGD: Adaptive Asynchronous Parallel Stochastic
  Gradient Descent
MindTheStep-AsyncPSGD: Adaptive Asynchronous Parallel Stochastic Gradient Descent
Karl Bäckström
Marina Papatriantafilou
P. Tsigas
58
12
0
08 Nov 2019
Small-GAN: Speeding Up GAN Training Using Core-sets
Small-GAN: Speeding Up GAN Training Using Core-sets
Samarth Sinha
Hang Zhang
Anirudh Goyal
Yoshua Bengio
Hugo Larochelle
Augustus Odena
GAN
99
77
0
29 Oct 2019
E2-Train: Training State-of-the-art CNNs with Over 80% Energy Savings
E2-Train: Training State-of-the-art CNNs with Over 80% Energy Savings
Yue Wang
Ziyu Jiang
Xiaohan Chen
Pengfei Xu
Yang Zhao
Yingyan Lin
Zhangyang Wang
MQ
107
83
0
29 Oct 2019
Neural Density Estimation and Likelihood-free Inference
Neural Density Estimation and Likelihood-free Inference
George Papamakarios
BDLDRL
95
47
0
29 Oct 2019
A Simple Dynamic Learning Rate Tuning Algorithm For Automated Training
  of DNNs
A Simple Dynamic Learning Rate Tuning Algorithm For Automated Training of DNNs
Koyel Mukherjee
Alind Khare
Ashish Verma
74
15
0
25 Oct 2019
Diametrical Risk Minimization: Theory and Computations
Diametrical Risk Minimization: Theory and Computations
Matthew Norton
J. Royset
57
19
0
24 Oct 2019
Explicitly Bayesian Regularizations in Deep Learning
Explicitly Bayesian Regularizations in Deep Learning
Xinjie Lan
Kenneth Barner
UQCVBDLAI4CE
103
1
0
22 Oct 2019
Robust Learning Rate Selection for Stochastic Optimization via Splitting
  Diagnostic
Robust Learning Rate Selection for Stochastic Optimization via Splitting Diagnostic
Matteo Sordello
Niccolò Dalmasso
Hangfeng He
Weijie Su
50
7
0
18 Oct 2019
On Warm-Starting Neural Network Training
On Warm-Starting Neural Network Training
Jordan T. Ash
Ryan P. Adams
AI4CE
58
21
0
18 Oct 2019
Improving the convergence of SGD through adaptive batch sizes
Improving the convergence of SGD through adaptive batch sizes
Scott Sievert
Zachary B. Charles
ODL
63
8
0
18 Oct 2019
KonIQ-10k: An ecologically valid database for deep learning of blind
  image quality assessment
KonIQ-10k: An ecologically valid database for deep learning of blind image quality assessment
Vlad Hosu
Hanhe Lin
T. Szirányi
Dietmar Saupe
123
582
0
14 Oct 2019
Emergent properties of the local geometry of neural loss landscapes
Emergent properties of the local geometry of neural loss landscapes
Stanislav Fort
Surya Ganguli
120
51
0
14 Oct 2019
Improved Sample Complexities for Deep Networks and Robust Classification
  via an All-Layer Margin
Improved Sample Complexities for Deep Networks and Robust Classification via an All-Layer Margin
Colin Wei
Tengyu Ma
AAMLOOD
72
85
0
09 Oct 2019
Parallelizing Training of Deep Generative Models on Massive Scientific
  Datasets
Parallelizing Training of Deep Generative Models on Massive Scientific Datasets
S. A. Jacobs
B. Van Essen
D. Hysom
Jae-Seung Yeom
Tim Moon
...
J. Gaffney
Tom Benson
Peter B. Robinson
L. Peterson
B. Spears
BDLAI4CE
67
17
0
05 Oct 2019
Distributed Learning of Deep Neural Networks using Independent Subnet
  Training
Distributed Learning of Deep Neural Networks using Independent Subnet Training
John Shelton Hyatt
Cameron R. Wolfe
Michael Lee
Yuxin Tang
Anastasios Kyrillidis
Christopher M. Jermaine
OOD
92
39
0
04 Oct 2019
Generalization Bounds for Convolutional Neural Networks
Generalization Bounds for Convolutional Neural Networks
Shan Lin
Jingwei Zhang
MLT
60
35
0
03 Oct 2019
Truth or Backpropaganda? An Empirical Investigation of Deep Learning
  Theory
Truth or Backpropaganda? An Empirical Investigation of Deep Learning Theory
Micah Goldblum
Jonas Geiping
Avi Schwarzschild
Michael Moeller
Tom Goldstein
103
34
0
01 Oct 2019
How noise affects the Hessian spectrum in overparameterized neural
  networks
How noise affects the Hessian spectrum in overparameterized neural networks
Ming-Bo Wei
D. Schwab
85
28
0
01 Oct 2019
At Stability's Edge: How to Adjust Hyperparameters to Preserve Minima
  Selection in Asynchronous Training of Neural Networks?
At Stability's Edge: How to Adjust Hyperparameters to Preserve Minima Selection in Asynchronous Training of Neural Networks?
Niv Giladi
Mor Shpigel Nacson
Elad Hoffer
Daniel Soudry
80
22
0
26 Sep 2019
GradVis: Visualization and Second Order Analysis of Optimization
  Surfaces during the Training of Deep Neural Networks
GradVis: Visualization and Second Order Analysis of Optimization Surfaces during the Training of Deep Neural Networks
Avraam Chatzimichailidis
Franz-Josef Pfreundt
N. Gauger
J. Keuper
52
10
0
26 Sep 2019
Towards Understanding the Transferability of Deep Representations
Towards Understanding the Transferability of Deep Representations
Hong Liu
Mingsheng Long
Jianmin Wang
Michael I. Jordan
66
26
0
26 Sep 2019
A Closer Look at Domain Shift for Deep Learning in Histopathology
A Closer Look at Domain Shift for Deep Learning in Histopathology
Karin Stacke
Gabriel Eilertsen
Jonas Unger
Claes Lundström
OOD
63
62
0
25 Sep 2019
EEG-Based Driver Drowsiness Estimation Using Feature Weighted Episodic
  Training
EEG-Based Driver Drowsiness Estimation Using Feature Weighted Episodic Training
Yuqi Cui
Yifan Xu
Dongrui Wu
64
63
0
25 Sep 2019
Decentralized Markov Chain Gradient Descent
Decentralized Markov Chain Gradient Descent
Tao Sun
Dongsheng Li
BDL
91
11
0
23 Sep 2019
Scale MLPerf-0.6 models on Google TPU-v3 Pods
Scale MLPerf-0.6 models on Google TPU-v3 Pods
Sameer Kumar
Victor Bitorff
Dehao Chen
Chi-Heng Chou
Blake A. Hechtman
...
Peter Mattson
Shibo Wang
Tao Wang
Yuanzhong Xu
Zongwei Zhou
67
39
0
21 Sep 2019
Understanding and Robustifying Differentiable Architecture Search
Understanding and Robustifying Differentiable Architecture Search
Arber Zela
T. Elsken
Tonmoy Saikia
Yassine Marrakchi
Thomas Brox
Frank Hutter
OODAAML
154
375
0
20 Sep 2019
Megatron-LM: Training Multi-Billion Parameter Language Models Using
  Model Parallelism
Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism
Mohammad Shoeybi
M. Patwary
Raul Puri
P. LeGresley
Jared Casper
Bryan Catanzaro
MoE
358
1,922
0
17 Sep 2019
Visualizing Movement Control Optimization Landscapes
Visualizing Movement Control Optimization Landscapes
Perttu Hämäläinen
Juuso Toikka
Amin Babadi
Karen Liu
56
7
0
17 Sep 2019
Addressing Algorithmic Bottlenecks in Elastic Machine Learning with
  Chicle
Addressing Algorithmic Bottlenecks in Elastic Machine Learning with Chicle
Michael Kaufmann
K. Kourtis
Celestine Mendler-Dünner
Adrian Schüpbach
Thomas Parnell
13
0
0
11 Sep 2019
Towards Understanding the Importance of Noise in Training Neural
  Networks
Towards Understanding the Importance of Noise in Training Neural Networks
Mo Zhou
Tianyi Liu
Yan Li
Dachao Lin
Enlu Zhou
T. Zhao
MLT
92
26
0
07 Sep 2019
Dissecting Non-Vacuous Generalization Bounds based on the Mean-Field
  Approximation
Dissecting Non-Vacuous Generalization Bounds based on the Mean-Field Approximation
Konstantinos Pitas
58
8
0
06 Sep 2019
LCA: Loss Change Allocation for Neural Network Training
LCA: Loss Change Allocation for Neural Network Training
Janice Lan
Rosanne Liu
Hattie Zhou
J. Yosinski
73
25
0
03 Sep 2019
Hybrid Data-Model Parallel Training for Sequence-to-Sequence Recurrent
  Neural Network Machine Translation
Hybrid Data-Model Parallel Training for Sequence-to-Sequence Recurrent Neural Network Machine Translation
Junya Ono
Masao Utiyama
Eiichiro Sumita
AIMatAI4CE
38
7
0
02 Sep 2019
Deep Learning Theory Review: An Optimal Control and Dynamical Systems
  Perspective
Deep Learning Theory Review: An Optimal Control and Dynamical Systems Perspective
Guan-Horng Liu
Evangelos A. Theodorou
AI4CE
118
72
0
28 Aug 2019
Towards Better Generalization: BP-SVRG in Training Deep Neural Networks
Towards Better Generalization: BP-SVRG in Training Deep Neural Networks
Hao Jin
Dachao Lin
Zhihua Zhang
ODL
35
2
0
18 Aug 2019
Regularizing CNN Transfer Learning with Randomised Regression
Regularizing CNN Transfer Learning with Randomised Regression
Yang Zhong
A. Maki
117
13
0
16 Aug 2019
Visualizing and Understanding the Effectiveness of BERT
Visualizing and Understanding the Effectiveness of BERT
Y. Hao
Li Dong
Furu Wei
Ke Xu
150
186
0
15 Aug 2019
Mix & Match: training convnets with mixed image sizes for improved
  accuracy, speed and scale resiliency
Mix & Match: training convnets with mixed image sizes for improved accuracy, speed and scale resiliency
Elad Hoffer
Berry Weinstein
Itay Hubara
Tal Ben-Nun
Torsten Hoefler
Daniel Soudry
113
20
0
12 Aug 2019
Instance Enhancement Batch Normalization: an Adaptive Regulator of Batch
  Noise
Instance Enhancement Batch Normalization: an Adaptive Regulator of Batch Noise
Senwei Liang
Zhongzhan Huang
Mingfu Liang
Haizhao Yang
94
59
0
12 Aug 2019
Progressive Transfer Learning
Progressive Transfer Learning
Zhengxu Yu
Long Wei
Zhongming Jin
Jianqiang Huang
Deng Cai
Xiansheng Hua
VLM
58
10
0
07 Aug 2019
How Does Learning Rate Decay Help Modern Neural Networks?
How Does Learning Rate Decay Help Modern Neural Networks?
Kaichao You
Mingsheng Long
Jianmin Wang
Michael I. Jordan
66
4
0
05 Aug 2019
On the Existence of Simpler Machine Learning Models
On the Existence of Simpler Machine Learning Models
Lesia Semenova
Cynthia Rudin
Ronald E. Parr
117
87
0
05 Aug 2019
Previous
123...232425...303132
Next