ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1609.04836
  4. Cited By
On Large-Batch Training for Deep Learning: Generalization Gap and Sharp
  Minima
v1v2 (latest)

On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima

15 September 2016
N. Keskar
Dheevatsa Mudigere
J. Nocedal
M. Smelyanskiy
P. T. P. Tang
    ODL
ArXiv (abs)PDFHTML

Papers citing "On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima"

50 / 1,653 papers shown
How Does Batch Normalization Help Optimization?
How Does Batch Normalization Help Optimization?
Shibani Santurkar
Dimitris Tsipras
Andrew Ilyas
Aleksander Madry
ODL
547
1,670
0
29 May 2018
Distilling Knowledge for Search-based Structured Prediction
Distilling Knowledge for Search-based Structured Prediction
Yijia Liu
Wanxiang Che
Huaipeng Zhao
Bing Qin
Ting Liu
106
22
0
29 May 2018
Investigating Label Noise Sensitivity of Convolutional Neural Networks
  for Fine Grained Audio Signal Labelling
Investigating Label Noise Sensitivity of Convolutional Neural Networks for Fine Grained Audio Signal Labelling
Rainer Kelz
Gerhard Widmer
NoLa
84
4
0
28 May 2018
A Double-Deep Spatio-Angular Learning Framework for Light Field based
  Face Recognition
A Double-Deep Spatio-Angular Learning Framework for Light Field based Face Recognition
Alireza Sepas-Moghaddam
M. A. Haque
P. Correia
Kamal Nasrollahi
T. Moeslund
F. Pereira
CVBM
146
38
0
25 May 2018
Local SGD Converges Fast and Communicates Little
Local SGD Converges Fast and Communicates Little
Sebastian U. Stich
FedML
1.1K
1,196
0
24 May 2018
Input and Weight Space Smoothing for Semi-supervised Learning
Input and Weight Space Smoothing for Semi-supervised Learning
Safa Cicek
Stefano Soatto
129
6
0
23 May 2018
Deep learning generalizes because the parameter-function map is biased
  towards simple functions
Deep learning generalizes because the parameter-function map is biased towards simple functions
Guillermo Valle Pérez
Chico Q. Camargo
A. Louis
MLTAI4CE
459
256
0
22 May 2018
Gradient Energy Matching for Distributed Asynchronous Gradient Descent
Gradient Energy Matching for Distributed Asynchronous Gradient Descent
Joeri Hermans
Gilles Louppe
128
5
0
22 May 2018
Stochastic modified equations for the asynchronous stochastic gradient
  descent
Stochastic modified equations for the asynchronous stochastic gradient descent
Jing An
Jian-wei Lu
Lexing Ying
173
78
0
21 May 2018
Never look back - A modified EnKF method and its application to the
  training of neural networks without back propagation
Never look back - A modified EnKF method and its application to the training of neural networks without back propagation
E. Haber
F. Lucka
Lars Ruthotto
245
33
0
21 May 2018
SmoothOut: Smoothing Out Sharp Minima to Improve Generalization in Deep
  Learning
SmoothOut: Smoothing Out Sharp Minima to Improve Generalization in Deep Learning
W. Wen
Yandan Wang
Feng Yan
Cong Xu
Chunpeng Wu
Yiran Chen
Xue Yang
241
54
0
21 May 2018
Parameter Hub: a Rack-Scale Parameter Server for Distributed Deep Neural
  Network Training
Parameter Hub: a Rack-Scale Parameter Server for Distributed Deep Neural Network Training
Liang Luo
Jacob Nelson
Luis Ceze
Amar Phanishayee
Arvind Krishnamurthy
339
128
0
21 May 2018
DNN or k-NN: That is the Generalize vs. Memorize Question
DNN or k-NN: That is the Generalize vs. Memorize Question
Gilad Cohen
Guillermo Sapiro
Raja Giryes
315
40
0
17 May 2018
Unifying Data, Model and Hybrid Parallelism in Deep Learning via Tensor
  Tiling
Unifying Data, Model and Hybrid Parallelism in Deep Learning via Tensor Tiling
Minjie Wang
Chien-chin Huang
Jinyang Li
FedML
94
25
0
10 May 2018
On Visual Hallmarks of Robustness to Adversarial Malware
On Visual Hallmarks of Robustness to Adversarial Malware
Alex Huang
Abdullah Al-Dujaili
Erik Hemberg
Una-May O’Reilly
AAML
143
8
0
09 May 2018
SaaS: Speed as a Supervisor for Semi-supervised Learning
SaaS: Speed as a Supervisor for Semi-supervised Learning
Safa Cicek
Alhussein Fawzi
Stefano Soatto
BDL
196
20
0
02 May 2018
SHADE: Information Based Regularization for Deep Learning
SHADE: Information Based Regularization for Deep Learning
Michael Blot
Thomas Robert
Nicolas Thome
Matthieu Cord
289
12
0
29 Apr 2018
HG-means: A scalable hybrid genetic algorithm for minimum sum-of-squares
  clustering
HG-means: A scalable hybrid genetic algorithm for minimum sum-of-squares clustering
Daniel Gribel
Thibaut Vidal
122
45
0
25 Apr 2018
Path Planning in Support of Smart Mobility Applications using Generative
  Adversarial Networks
Path Planning in Support of Smart Mobility Applications using Generative Adversarial Networks
M. Mohammadi
Ala I. Al-Fuqaha
Jun-Seok Oh
GAN
233
25
0
23 Apr 2018
Revisiting Small Batch Training for Deep Neural Networks
Revisiting Small Batch Training for Deep Neural Networks
Dominic Masters
Carlo Luschi
ODL
179
735
0
20 Apr 2018
Non-Vacuous Generalization Bounds at the ImageNet Scale: A PAC-Bayesian
  Compression Approach
Non-Vacuous Generalization Bounds at the ImageNet Scale: A PAC-Bayesian Compression Approach
Wenda Zhou
Victor Veitch
Morgane Austern
Ryan P. Adams
Peter Orbanz
326
228
0
16 Apr 2018
DeepFM: An End-to-End Wide & Deep Learning Framework for CTR Prediction
DeepFM: An End-to-End Wide & Deep Learning Framework for CTR Prediction
Huifeng Guo
Ruiming Tang
Yunming Ye
Zhenguo Li
Xiuqiang He
Zhenhua Dong
247
65
0
12 Apr 2018
Large scale distributed neural network training through online
  distillation
Large scale distributed neural network training through online distillation
Rohan Anil
Gabriel Pereyra
Alexandre Passos
Róbert Ormándi
George E. Dahl
Geoffrey E. Hinton
FedML
681
441
0
09 Apr 2018
The Loss Surface of XOR Artificial Neural Networks
The Loss Surface of XOR Artificial Neural Networks
D. Mehta
Xiaojun Zhao
Edgar A. Bernal
D. Wales
323
19
0
06 Apr 2018
Training Tips for the Transformer Model
Training Tips for the Transformer Model
Martin Popel
Ondrej Bojar
482
326
0
01 Apr 2018
Online Second Order Methods for Non-Convex Stochastic Optimizations
Online Second Order Methods for Non-Convex Stochastic Optimizations
Xi-Lin Li
OffRLODL
111
4
0
26 Mar 2018
On the Local Minima of the Empirical Risk
On the Local Minima of the Empirical Risk
Chi Jin
Lydia T. Liu
Rong Ge
Sai Li
FedML
323
58
0
25 Mar 2018
Multiple Sclerosis Lesion Segmentation from Brain MRI via Fully
  Convolutional Neural Networks
Multiple Sclerosis Lesion Segmentation from Brain MRI via Fully Convolutional Neural Networks
Snehashis Roy
J. Butman
Daniel Reich
P. Calabresi
Dzung L. Pham
MedIm
121
93
0
24 Mar 2018
A high-bias, low-variance introduction to Machine Learning for
  physicists
A high-bias, low-variance introduction to Machine Learning for physicists
Pankaj Mehta
Marin Bukov
Ching-Hao Wang
A. G. Day
C. Richardson
Charles K. Fisher
D. Schwab
AI4CE
397
951
0
23 Mar 2018
Gradient Descent Quantizes ReLU Network Features
Gradient Descent Quantizes ReLU Network Features
Hartmut Maennel
Olivier Bousquet
Sylvain Gelly
MLT
148
88
0
22 Mar 2018
Learning Eligibility in Cancer Clinical Trials using Deep Neural
  Networks
Learning Eligibility in Cancer Clinical Trials using Deep Neural Networks
A. Bustos
A. Pertusa
153
27
0
22 Mar 2018
Assessing Shape Bias Property of Convolutional Neural Networks
Assessing Shape Bias Property of Convolutional Neural Networks
Hossein Hosseini
Baicen Xiao
Mayoore S. Jaiswal
Radha Poovendran
145
39
0
21 Mar 2018
Comparing Dynamics: Deep Neural Networks versus Glassy Systems
Comparing Dynamics: Deep Neural Networks versus Glassy Systems
Carlo Albert
Levent Sagun
Mario Geiger
S. Spigler
Gerard Ben Arous
C. Cammarota
Yann LeCun
Matthieu Wyart
Giulio Biroli
AI4CE
343
124
0
19 Mar 2018
On the importance of single directions for generalization
On the importance of single directions for generalization
Ari S. Morcos
David Barrett
Neil C. Rabinowitz
M. Botvinick
456
348
0
19 Mar 2018
On the insufficiency of existing momentum schemes for Stochastic
  Optimization
On the insufficiency of existing momentum schemes for Stochastic OptimizationInformation Theory and Applications Workshop (ITA), 2018
Rahul Kidambi
Praneeth Netrapalli
Prateek Jain
Sham Kakade
ODL
267
130
0
15 Mar 2018
Averaging Weights Leads to Wider Optima and Better Generalization
Averaging Weights Leads to Wider Optima and Better GeneralizationConference on Uncertainty in Artificial Intelligence (UAI), 2018
Pavel Izmailov
Dmitrii Podoprikhin
T. Garipov
Dmitry Vetrov
A. Wilson
FedMLMoMe
649
1,898
0
14 Mar 2018
TicTac: Accelerating Distributed Deep Learning with Communication
  Scheduling
TicTac: Accelerating Distributed Deep Learning with Communication Scheduling
Sayed Hadi Hashemi
Sangeetha Abdu Jyothi
R. Campbell
170
206
0
08 Mar 2018
Essentially No Barriers in Neural Network Energy Landscape
Essentially No Barriers in Neural Network Energy LandscapeInternational Conference on Machine Learning (ICML), 2018
Felix Dräxler
K. Veschgini
M. Salmhofer
Fred Hamprecht
MoMe
594
491
0
02 Mar 2018
The Anisotropic Noise in Stochastic Gradient Descent: Its Behavior of
  Escaping from Sharp Minima and Regularization Effects
The Anisotropic Noise in Stochastic Gradient Descent: Its Behavior of Escaping from Sharp Minima and Regularization Effects
Zhanxing Zhu
Jingfeng Wu
Ting Yu
Lei Wu
Jin Ma
239
40
0
01 Mar 2018
Neural Inverse Rendering for General Reflectance Photometric Stereo
Neural Inverse Rendering for General Reflectance Photometric StereoInternational Conference on Machine Learning (ICML), 2018
Tatsunori Taniai
Takanori Maehara
268
112
0
28 Feb 2018
Semi-Supervised Learning Enabled by Multiscale Deep Neural Network
  Inversion
Semi-Supervised Learning Enabled by Multiscale Deep Neural Network Inversion
Randall Balestriero
H. Glotin
Richard Baraniuk
BDL
184
6
0
27 Feb 2018
Loss Surfaces, Mode Connectivity, and Fast Ensembling of DNNs
Loss Surfaces, Mode Connectivity, and Fast Ensembling of DNNsNeural Information Processing Systems (NeurIPS), 2018
T. Garipov
Pavel Izmailov
Dmitrii Podoprikhin
Dmitry Vetrov
A. Wilson
UQCV
677
852
0
27 Feb 2018
Solving Inverse Computational Imaging Problems using Deep Pixel-level
  Prior
Solving Inverse Computational Imaging Problems using Deep Pixel-level PriorIEEE Transactions on Computational Imaging (TCI), 2018
Akshat Dave
Anil Kumar Vadathya
R. Subramanyam
R. Baburajan
Kaushik Mitra
185
24
0
27 Feb 2018
A Walk with SGD
A Walk with SGD
Chen Xing
Devansh Arpit
Christos Tsirigotis
Yoshua Bengio
349
133
0
24 Feb 2018
Sensitivity and Generalization in Neural Networks: an Empirical Study
Sensitivity and Generalization in Neural Networks: an Empirical Study
Roman Novak
Yasaman Bahri
Daniel A. Abolafia
Jeffrey Pennington
Jascha Narain Sohl-Dickstein
AAML
413
482
0
23 Feb 2018
Characterizing Implicit Bias in Terms of Optimization Geometry
Characterizing Implicit Bias in Terms of Optimization Geometry
Suriya Gunasekar
Jason D. Lee
Daniel Soudry
Nathan Srebro
AI4CE
548
436
0
22 Feb 2018
Hessian-based Analysis of Large Batch Training and Robustness to
  Adversaries
Hessian-based Analysis of Large Batch Training and Robustness to Adversaries
Z. Yao
A. Gholami
Qi Lei
Kurt Keutzer
Michael W. Mahoney
449
177
0
22 Feb 2018
The Secret Sharer: Evaluating and Testing Unintended Memorization in
  Neural Networks
The Secret Sharer: Evaluating and Testing Unintended Memorization in Neural Networks
Nicholas Carlini
Chang-rui Liu
Ulfar Erlingsson
Jernej Kos
Basel Alomair
768
1,322
0
22 Feb 2018
Improved Techniques For Weakly-Supervised Object Localization
Improved Techniques For Weakly-Supervised Object Localization
Junsuk Choe
J. Park
Hyunjung Shim
WSOL
275
8
0
22 Feb 2018
An Alternative View: When Does SGD Escape Local Minima?
An Alternative View: When Does SGD Escape Local Minima?
Robert D. Kleinberg
Yuanzhi Li
Yang Yuan
MLT
347
332
0
17 Feb 2018
Previous
123...3031323334
Next
Page 31 of 34
Pageof 34