ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1609.04836
  4. Cited By
On Large-Batch Training for Deep Learning: Generalization Gap and Sharp
  Minima
v1v2 (latest)

On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima

15 September 2016
N. Keskar
Dheevatsa Mudigere
J. Nocedal
M. Smelyanskiy
P. T. P. Tang
    ODL
ArXiv (abs)PDFHTML

Papers citing "On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima"

50 / 1,554 papers shown
Title
Hyperplane Arrangements of Trained ConvNets Are Biased
Hyperplane Arrangements of Trained ConvNets Are Biased
Matteo Gamba
S. Carlsson
Hossein Azizpour
Mårten Björkman
41
5
0
17 Mar 2020
Investigating Generalization in Neural Networks under Optimally Evolved
  Training Perturbations
Investigating Generalization in Neural Networks under Optimally Evolved Training Perturbations
Subhajit Chaudhury
T. Yamasaki
24
3
0
14 Mar 2020
Interference and Generalization in Temporal Difference Learning
Interference and Generalization in Temporal Difference Learning
Emmanuel Bengio
Joelle Pineau
Doina Precup
77
61
0
13 Mar 2020
Communication-Efficient Distributed Deep Learning: A Comprehensive
  Survey
Communication-Efficient Distributed Deep Learning: A Comprehensive Survey
Zhenheng Tang
Shaoshuai Shi
Wei Wang
Yue Liu
Xiaowen Chu
80
49
0
10 Mar 2020
Wide-minima Density Hypothesis and the Explore-Exploit Learning Rate
  Schedule
Wide-minima Density Hypothesis and the Explore-Exploit Learning Rate Schedule
Nikhil Iyer
V. Thejas
Nipun Kwatra
Ramachandran Ramjee
Muthian Sivathanu
75
29
0
09 Mar 2020
AL2: Progressive Activation Loss for Learning General Representations in
  Classification Neural Networks
AL2: Progressive Activation Loss for Learning General Representations in Classification Neural Networks
Majed El Helou
Frederike Dumbgen
Sabine Süsstrunk
CLLAI4CE
32
2
0
07 Mar 2020
Communication optimization strategies for distributed deep neural
  network training: A survey
Communication optimization strategies for distributed deep neural network training: A survey
Shuo Ouyang
Dezun Dong
Yemao Xu
Liquan Xiao
116
12
0
06 Mar 2020
The large learning rate phase of deep learning: the catapult mechanism
The large learning rate phase of deep learning: the catapult mechanism
Aitor Lewkowycz
Yasaman Bahri
Ethan Dyer
Jascha Narain Sohl-Dickstein
Guy Gur-Ari
ODL
218
241
0
04 Mar 2020
Rethinking Parameter Counting in Deep Models: Effective Dimensionality
  Revisited
Rethinking Parameter Counting in Deep Models: Effective Dimensionality Revisited
Wesley J. Maddox
Gregory W. Benton
A. Wilson
136
61
0
04 Mar 2020
Automatic Perturbation Analysis for Scalable Certified Robustness and
  Beyond
Automatic Perturbation Analysis for Scalable Certified Robustness and Beyond
Kaidi Xu
Zhouxing Shi
Huan Zhang
Yihan Wang
Kai-Wei Chang
Minlie Huang
B. Kailkhura
Xinyu Lin
Cho-Jui Hsieh
AAML
55
12
0
28 Feb 2020
The Implicit and Explicit Regularization Effects of Dropout
The Implicit and Explicit Regularization Effects of Dropout
Colin Wei
Sham Kakade
Tengyu Ma
116
118
0
28 Feb 2020
Coherent Gradients: An Approach to Understanding Generalization in
  Gradient Descent-based Optimization
Coherent Gradients: An Approach to Understanding Generalization in Gradient Descent-based Optimization
S. Chatterjee
ODLOOD
117
51
0
25 Feb 2020
The Two Regimes of Deep Network Training
The Two Regimes of Deep Network Training
Guillaume Leclerc
Aleksander Madry
94
45
0
24 Feb 2020
De-randomized PAC-Bayes Margin Bounds: Applications to Non-convex and
  Non-smooth Predictors
De-randomized PAC-Bayes Margin Bounds: Applications to Non-convex and Non-smooth Predictors
A. Banerjee
Tiancong Chen
Yingxue Zhou
BDL
70
8
0
23 Feb 2020
Communication-Efficient Edge AI: Algorithms and Systems
Communication-Efficient Edge AI: Algorithms and Systems
Yuanming Shi
Kai Yang
Tao Jiang
Jun Zhang
Khaled B. Letaief
GNN
99
334
0
22 Feb 2020
The Break-Even Point on Optimization Trajectories of Deep Neural
  Networks
The Break-Even Point on Optimization Trajectories of Deep Neural Networks
Stanislaw Jastrzebski
Maciej Szymczak
Stanislav Fort
Devansh Arpit
Jacek Tabor
Kyunghyun Cho
Krzysztof J. Geras
88
164
0
21 Feb 2020
Parallel and distributed asynchronous adaptive stochastic gradient
  methods
Parallel and distributed asynchronous adaptive stochastic gradient methods
Yangyang Xu
Yibo Xu
Yonggui Yan
Colin Sutcher-Shepard
Leopold Grinberg
Jiewei Chen
30
2
0
21 Feb 2020
Bayesian Deep Learning and a Probabilistic Perspective of Generalization
Bayesian Deep Learning and a Probabilistic Perspective of Generalization
A. Wilson
Pavel Izmailov
UQCVBDLOOD
148
656
0
20 Feb 2020
Do We Need Zero Training Loss After Achieving Zero Training Error?
Do We Need Zero Training Loss After Achieving Zero Training Error?
Takashi Ishida
Ikko Yamane
Tomoya Sakai
Gang Niu
Masashi Sugiyama
AI4CE
70
137
0
20 Feb 2020
Revisiting Training Strategies and Generalization Performance in Deep
  Metric Learning
Revisiting Training Strategies and Generalization Performance in Deep Metric Learning
Karsten Roth
Timo Milbich
Samarth Sinha
Prateek Gupta
Bjorn Ommer
Joseph Paul Cohen
163
173
0
19 Feb 2020
Unique Properties of Flat Minima in Deep Networks
Unique Properties of Flat Minima in Deep Networks
Rotem Mulayoff
T. Michaeli
ODL
59
4
0
11 Feb 2020
Think Global, Act Local: Relating DNN generalisation and node-level SNR
Think Global, Act Local: Relating DNN generalisation and node-level SNR
Paul Norridge
24
1
0
11 Feb 2020
A Diffusion Theory For Deep Learning Dynamics: Stochastic Gradient
  Descent Exponentially Favors Flat Minima
A Diffusion Theory For Deep Learning Dynamics: Stochastic Gradient Descent Exponentially Favors Flat Minima
Zeke Xie
Issei Sato
Masashi Sugiyama
ODL
127
17
0
10 Feb 2020
Large Batch Training Does Not Need Warmup
Large Batch Training Does Not Need Warmup
Zhouyuan Huo
Bin Gu
Heng-Chiao Huang
AI4CEODL
47
5
0
04 Feb 2020
Optimizing Loss Functions Through Multivariate Taylor Polynomial
  Parameterization
Optimizing Loss Functions Through Multivariate Taylor Polynomial Parameterization
Santiago Gonzalez
Risto Miikkulainen
55
9
0
31 Jan 2020
The Case for Bayesian Deep Learning
The Case for Bayesian Deep Learning
A. Wilson
UQCVBDLOOD
132
114
0
29 Jan 2020
Identifying Mislabeled Data using the Area Under the Margin Ranking
Identifying Mislabeled Data using the Area Under the Margin Ranking
Geoff Pleiss
Tianyi Zhang
Ethan R. Elenberg
Kilian Q. Weinberger
NoLa
119
274
0
28 Jan 2020
Automatic phantom test pattern classification through transfer learning
  with deep neural networks
Automatic phantom test pattern classification through transfer learning with deep neural networks
Rafael B. Fricks
Justin Solomon
Ehsan Samei
MedIm
30
0
0
22 Jan 2020
A Deep Learning Algorithm for High-Dimensional Exploratory Item Factor
  Analysis
A Deep Learning Algorithm for High-Dimensional Exploratory Item Factor Analysis
Christopher J. Urban
Daniel J. Bauer
BDL
64
33
0
22 Jan 2020
Understanding Why Neural Networks Generalize Well Through GSNR of
  Parameters
Understanding Why Neural Networks Generalize Well Through GSNR of Parameters
Jinlong Liu
Guo-qing Jiang
Yunzhi Bai
Ting Chen
Huayan Wang
AI4CE
143
50
0
21 Jan 2020
SEERL: Sample Efficient Ensemble Reinforcement Learning
SEERL: Sample Efficient Ensemble Reinforcement Learning
Rohan Saphal
Balaraman Ravindran
Dheevatsa Mudigere
Sasikanth Avancha
Bharat Kaul
65
19
0
15 Jan 2020
Uncertainty-Aware Multi-Shot Knowledge Distillation for Image-Based
  Object Re-Identification
Uncertainty-Aware Multi-Shot Knowledge Distillation for Image-Based Object Re-Identification
Xin Jin
Cuiling Lan
Wenjun Zeng
Zhibo Chen
79
106
0
15 Jan 2020
Understanding Generalization in Deep Learning via Tensor Methods
Understanding Generalization in Deep Learning via Tensor Methods
Jingling Li
Yanchao Sun
Jiahao Su
Taiji Suzuki
Furong Huang
112
28
0
14 Jan 2020
Rethinking Curriculum Learning with Incremental Labels and Adaptive
  Compensation
Rethinking Curriculum Learning with Incremental Labels and Adaptive Compensation
Madan Ravi Ganesh
Jason J. Corso
ODL
54
10
0
13 Jan 2020
Stochastic Weight Averaging in Parallel: Large-Batch Training that
  Generalizes Well
Stochastic Weight Averaging in Parallel: Large-Batch Training that Generalizes Well
Vipul Gupta
S. Serrano
D. DeCoste
MoMe
83
60
0
07 Jan 2020
Relative Flatness and Generalization
Relative Flatness and Generalization
Henning Petzka
Michael Kamp
Linara Adilova
C. Sminchisescu
Mario Boley
87
78
0
03 Jan 2020
'Place-cell' emergence and learning of invariant data with restricted
  Boltzmann machines: breaking and dynamical restoration of continuous
  symmetries in the weight space
'Place-cell' emergence and learning of invariant data with restricted Boltzmann machines: breaking and dynamical restoration of continuous symmetries in the weight space
Moshir Harsh
J. Tubiana
Simona Cocco
R. Monasson
49
15
0
30 Dec 2019
CProp: Adaptive Learning Rate Scaling from Past Gradient Conformity
CProp: Adaptive Learning Rate Scaling from Past Gradient Conformity
Konpat Preechakul
B. Kijsirikul
ODL
38
3
0
24 Dec 2019
Landscape Connectivity and Dropout Stability of SGD Solutions for
  Over-parameterized Neural Networks
Landscape Connectivity and Dropout Stability of SGD Solutions for Over-parameterized Neural Networks
Aleksandr Shevchenko
Marco Mondelli
196
37
0
20 Dec 2019
Analysis of Video Feature Learning in Two-Stream CNNs on the Example of
  Zebrafish Swim Bout Classification
Analysis of Video Feature Learning in Two-Stream CNNs on the Example of Zebrafish Swim Bout Classification
Bennet Breier
A. Onken
36
4
0
20 Dec 2019
Optimization for deep learning: theory and algorithms
Optimization for deep learning: theory and algorithms
Ruoyu Sun
ODL
137
169
0
19 Dec 2019
Tangent Space Separability in Feedforward Neural Networks
Tangent Space Separability in Feedforward Neural Networks
Balint Daroczy
Rita Aleksziev
András A. Benczúr
45
3
0
18 Dec 2019
Learning under Model Misspecification: Applications to Variational and
  Ensemble methods
Learning under Model Misspecification: Applications to Variational and Ensemble methods
A. Masegosa
16
1
0
18 Dec 2019
On the Bias-Variance Tradeoff: Textbooks Need an Update
On the Bias-Variance Tradeoff: Textbooks Need an Update
Brady Neal
43
18
0
17 Dec 2019
Linear Mode Connectivity and the Lottery Ticket Hypothesis
Linear Mode Connectivity and the Lottery Ticket Hypothesis
Jonathan Frankle
Gintare Karolina Dziugaite
Daniel M. Roy
Michael Carbin
MoMe
181
630
0
11 Dec 2019
Arithmetic addition of two integers by deep image classification
  networks: experiments to quantify their autonomous reasoning ability
Arithmetic addition of two integers by deep image classification networks: experiments to quantify their autonomous reasoning ability
Shuaicheng Liu
Ze Zhang
Kai Song
B. Zeng
24
1
0
10 Dec 2019
InfoCNF: An Efficient Conditional Continuous Normalizing Flow with
  Adaptive Solvers
InfoCNF: An Efficient Conditional Continuous Normalizing Flow with Adaptive Solvers
T. Nguyen
Animesh Garg
Richard G. Baraniuk
Anima Anandkumar
TPM
104
9
0
09 Dec 2019
Observational Overfitting in Reinforcement Learning
Observational Overfitting in Reinforcement Learning
Xingyou Song
Yiding Jiang
Stephen Tu
Yilun Du
Behnam Neyshabur
OffRL
124
140
0
06 Dec 2019
Fantastic Generalization Measures and Where to Find Them
Fantastic Generalization Measures and Where to Find Them
Yiding Jiang
Behnam Neyshabur
H. Mobahi
Dilip Krishnan
Samy Bengio
AI4CE
148
611
0
04 Dec 2019
The Group Loss for Deep Metric Learning
The Group Loss for Deep Metric Learning
Ismail Elezi
Sebastiano Vascon
Alessandro Torcinovich
Marcello Pelillo
Laura Leal-Taixe
175
51
0
01 Dec 2019
Previous
123...222324...303132
Next