ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1609.04836
  4. Cited By
On Large-Batch Training for Deep Learning: Generalization Gap and Sharp
  Minima
v1v2 (latest)

On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima

15 September 2016
N. Keskar
Dheevatsa Mudigere
J. Nocedal
M. Smelyanskiy
P. T. P. Tang
    ODL
ArXiv (abs)PDFHTML

Papers citing "On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima"

50 / 1,554 papers shown
Title
The Dynamics of Learning: A Random Matrix Approach
The Dynamics of Learning: A Random Matrix Approach
Zhenyu Liao
Romain Couillet
AI4CE
69
43
0
30 May 2018
How Does Batch Normalization Help Optimization?
How Does Batch Normalization Help Optimization?
Shibani Santurkar
Dimitris Tsipras
Andrew Ilyas
Aleksander Madry
ODL
136
1,548
0
29 May 2018
Distilling Knowledge for Search-based Structured Prediction
Distilling Knowledge for Search-based Structured Prediction
Yijia Liu
Wanxiang Che
Huaipeng Zhao
Bing Qin
Ting Liu
51
22
0
29 May 2018
Investigating Label Noise Sensitivity of Convolutional Neural Networks
  for Fine Grained Audio Signal Labelling
Investigating Label Noise Sensitivity of Convolutional Neural Networks for Fine Grained Audio Signal Labelling
Rainer Kelz
Gerhard Widmer
NoLa
26
4
0
28 May 2018
A Double-Deep Spatio-Angular Learning Framework for Light Field based
  Face Recognition
A Double-Deep Spatio-Angular Learning Framework for Light Field based Face Recognition
Alireza Sepas-Moghaddam
M. A. Haque
P. Correia
Kamal Nasrollahi
T. Moeslund
F. Pereira
CVBM
51
36
0
25 May 2018
Local SGD Converges Fast and Communicates Little
Local SGD Converges Fast and Communicates Little
Sebastian U. Stich
FedML
205
1,071
0
24 May 2018
Input and Weight Space Smoothing for Semi-supervised Learning
Input and Weight Space Smoothing for Semi-supervised Learning
Safa Cicek
Stefano Soatto
47
6
0
23 May 2018
Deep learning generalizes because the parameter-function map is biased
  towards simple functions
Deep learning generalizes because the parameter-function map is biased towards simple functions
Guillermo Valle Pérez
Chico Q. Camargo
A. Louis
MLTAI4CE
122
232
0
22 May 2018
Gradient Energy Matching for Distributed Asynchronous Gradient Descent
Gradient Energy Matching for Distributed Asynchronous Gradient Descent
Joeri Hermans
Gilles Louppe
40
5
0
22 May 2018
Stochastic modified equations for the asynchronous stochastic gradient
  descent
Stochastic modified equations for the asynchronous stochastic gradient descent
Jing An
Jian-wei Lu
Lexing Ying
77
79
0
21 May 2018
Never look back - A modified EnKF method and its application to the
  training of neural networks without back propagation
Never look back - A modified EnKF method and its application to the training of neural networks without back propagation
E. Haber
F. Lucka
Lars Ruthotto
62
32
0
21 May 2018
SmoothOut: Smoothing Out Sharp Minima to Improve Generalization in Deep
  Learning
SmoothOut: Smoothing Out Sharp Minima to Improve Generalization in Deep Learning
W. Wen
Yandan Wang
Feng Yan
Cong Xu
Chunpeng Wu
Yiran Chen
H. Li
79
51
0
21 May 2018
Parameter Hub: a Rack-Scale Parameter Server for Distributed Deep Neural
  Network Training
Parameter Hub: a Rack-Scale Parameter Server for Distributed Deep Neural Network Training
Liang Luo
Jacob Nelson
Luis Ceze
Amar Phanishayee
Arvind Krishnamurthy
154
121
0
21 May 2018
DNN or k-NN: That is the Generalize vs. Memorize Question
DNN or k-NN: That is the Generalize vs. Memorize Question
Gilad Cohen
Guillermo Sapiro
Raja Giryes
60
38
0
17 May 2018
Unifying Data, Model and Hybrid Parallelism in Deep Learning via Tensor
  Tiling
Unifying Data, Model and Hybrid Parallelism in Deep Learning via Tensor Tiling
Minjie Wang
Chien-chin Huang
Jinyang Li
FedML
56
25
0
10 May 2018
On Visual Hallmarks of Robustness to Adversarial Malware
On Visual Hallmarks of Robustness to Adversarial Malware
Alex Huang
Abdullah Al-Dujaili
Erik Hemberg
Una-May O’Reilly
AAML
69
7
0
09 May 2018
SaaS: Speed as a Supervisor for Semi-supervised Learning
SaaS: Speed as a Supervisor for Semi-supervised Learning
Safa Cicek
Alhussein Fawzi
Stefano Soatto
BDL
85
19
0
02 May 2018
SHADE: Information Based Regularization for Deep Learning
SHADE: Information Based Regularization for Deep Learning
Michael Blot
Thomas Robert
Nicolas Thome
Matthieu Cord
62
12
0
29 Apr 2018
HG-means: A scalable hybrid genetic algorithm for minimum sum-of-squares
  clustering
HG-means: A scalable hybrid genetic algorithm for minimum sum-of-squares clustering
Daniel Gribel
Thibaut Vidal
37
41
0
25 Apr 2018
Path Planning in Support of Smart Mobility Applications using Generative
  Adversarial Networks
Path Planning in Support of Smart Mobility Applications using Generative Adversarial Networks
M. Mohammadi
Ala I. Al-Fuqaha
Jun-Seok Oh
GAN
69
24
0
23 Apr 2018
Revisiting Small Batch Training for Deep Neural Networks
Revisiting Small Batch Training for Deep Neural Networks
Dominic Masters
Carlo Luschi
ODL
80
669
0
20 Apr 2018
Non-Vacuous Generalization Bounds at the ImageNet Scale: A PAC-Bayesian
  Compression Approach
Non-Vacuous Generalization Bounds at the ImageNet Scale: A PAC-Bayesian Compression Approach
Wenda Zhou
Victor Veitch
Morgane Austern
Ryan P. Adams
Peter Orbanz
75
215
0
16 Apr 2018
DeepFM: An End-to-End Wide & Deep Learning Framework for CTR Prediction
DeepFM: An End-to-End Wide & Deep Learning Framework for CTR Prediction
Huifeng Guo
Ruiming Tang
Yunming Ye
Zhenguo Li
Xiuqiang He
Zhenhua Dong
158
64
0
12 Apr 2018
Large scale distributed neural network training through online
  distillation
Large scale distributed neural network training through online distillation
Rohan Anil
Gabriel Pereyra
Alexandre Passos
Róbert Ormándi
George E. Dahl
Geoffrey E. Hinton
FedML
336
408
0
09 Apr 2018
The Loss Surface of XOR Artificial Neural Networks
The Loss Surface of XOR Artificial Neural Networks
D. Mehta
Xiaojun Zhao
Edgar A. Bernal
D. Wales
156
19
0
06 Apr 2018
Training Tips for the Transformer Model
Training Tips for the Transformer Model
Martin Popel
Ondrej Bojar
86
312
0
01 Apr 2018
Online Second Order Methods for Non-Convex Stochastic Optimizations
Online Second Order Methods for Non-Convex Stochastic Optimizations
Xi-Lin Li
OffRLODL
41
4
0
26 Mar 2018
On the Local Minima of the Empirical Risk
On the Local Minima of the Empirical Risk
Chi Jin
Lydia T. Liu
Rong Ge
Michael I. Jordan
FedML
147
56
0
25 Mar 2018
Multiple Sclerosis Lesion Segmentation from Brain MRI via Fully
  Convolutional Neural Networks
Multiple Sclerosis Lesion Segmentation from Brain MRI via Fully Convolutional Neural Networks
Snehashis Roy
J. Butman
Daniel Reich
P. Calabresi
Dzung L. Pham
MedIm
50
86
0
24 Mar 2018
A high-bias, low-variance introduction to Machine Learning for
  physicists
A high-bias, low-variance introduction to Machine Learning for physicists
Pankaj Mehta
Marin Bukov
Ching-Hao Wang
A. G. Day
C. Richardson
Charles K. Fisher
D. Schwab
AI4CE
119
880
0
23 Mar 2018
Gradient Descent Quantizes ReLU Network Features
Gradient Descent Quantizes ReLU Network Features
Hartmut Maennel
Olivier Bousquet
Sylvain Gelly
MLT
66
82
0
22 Mar 2018
Learning Eligibility in Cancer Clinical Trials using Deep Neural
  Networks
Learning Eligibility in Cancer Clinical Trials using Deep Neural Networks
A. Bustos
A. Pertusa
25
28
0
22 Mar 2018
Assessing Shape Bias Property of Convolutional Neural Networks
Assessing Shape Bias Property of Convolutional Neural Networks
Hossein Hosseini
Baicen Xiao
Mayoore S. Jaiswal
Radha Poovendran
63
36
0
21 Mar 2018
Comparing Dynamics: Deep Neural Networks versus Glassy Systems
Comparing Dynamics: Deep Neural Networks versus Glassy Systems
Marco Baity-Jesi
Levent Sagun
Mario Geiger
S. Spigler
Gerard Ben Arous
C. Cammarota
Yann LeCun
Matthieu Wyart
Giulio Biroli
AI4CE
112
115
0
19 Mar 2018
On the importance of single directions for generalization
On the importance of single directions for generalization
Ari S. Morcos
David Barrett
Neil C. Rabinowitz
M. Botvinick
97
333
0
19 Mar 2018
On the insufficiency of existing momentum schemes for Stochastic
  Optimization
On the insufficiency of existing momentum schemes for Stochastic Optimization
Rahul Kidambi
Praneeth Netrapalli
Prateek Jain
Sham Kakade
ODL
90
120
0
15 Mar 2018
Averaging Weights Leads to Wider Optima and Better Generalization
Averaging Weights Leads to Wider Optima and Better Generalization
Pavel Izmailov
Dmitrii Podoprikhin
T. Garipov
Dmitry Vetrov
A. Wilson
FedMLMoMe
149
1,673
0
14 Mar 2018
TicTac: Accelerating Distributed Deep Learning with Communication
  Scheduling
TicTac: Accelerating Distributed Deep Learning with Communication Scheduling
Sayed Hadi Hashemi
Sangeetha Abdu Jyothi
R. Campbell
51
199
0
08 Mar 2018
Essentially No Barriers in Neural Network Energy Landscape
Essentially No Barriers in Neural Network Energy Landscape
Felix Dräxler
K. Veschgini
M. Salmhofer
Fred Hamprecht
MoMe
136
436
0
02 Mar 2018
The Anisotropic Noise in Stochastic Gradient Descent: Its Behavior of
  Escaping from Sharp Minima and Regularization Effects
The Anisotropic Noise in Stochastic Gradient Descent: Its Behavior of Escaping from Sharp Minima and Regularization Effects
Zhanxing Zhu
Jingfeng Wu
Ting Yu
Lei Wu
Jin Ma
65
40
0
01 Mar 2018
Neural Inverse Rendering for General Reflectance Photometric Stereo
Neural Inverse Rendering for General Reflectance Photometric Stereo
Tatsunori Taniai
Takanori Maehara
130
105
0
28 Feb 2018
Semi-Supervised Learning Enabled by Multiscale Deep Neural Network
  Inversion
Semi-Supervised Learning Enabled by Multiscale Deep Neural Network Inversion
Randall Balestriero
H. Glotin
Richard Baraniuk
BDL
110
5
0
27 Feb 2018
Loss Surfaces, Mode Connectivity, and Fast Ensembling of DNNs
Loss Surfaces, Mode Connectivity, and Fast Ensembling of DNNs
T. Garipov
Pavel Izmailov
Dmitrii Podoprikhin
Dmitry Vetrov
A. Wilson
UQCV
114
758
0
27 Feb 2018
Solving Inverse Computational Imaging Problems using Deep Pixel-level
  Prior
Solving Inverse Computational Imaging Problems using Deep Pixel-level Prior
Akshat Dave
Anil Kumar Vadathya
R. Subramanyam
R. Baburajan
Kaushik Mitra
61
22
0
27 Feb 2018
A Walk with SGD
A Walk with SGD
Chen Xing
Devansh Arpit
Christos Tsirigotis
Yoshua Bengio
96
119
0
24 Feb 2018
Sensitivity and Generalization in Neural Networks: an Empirical Study
Sensitivity and Generalization in Neural Networks: an Empirical Study
Roman Novak
Yasaman Bahri
Daniel A. Abolafia
Jeffrey Pennington
Jascha Narain Sohl-Dickstein
AAML
99
442
0
23 Feb 2018
Characterizing Implicit Bias in Terms of Optimization Geometry
Characterizing Implicit Bias in Terms of Optimization Geometry
Suriya Gunasekar
Jason D. Lee
Daniel Soudry
Nathan Srebro
AI4CE
90
413
0
22 Feb 2018
Hessian-based Analysis of Large Batch Training and Robustness to
  Adversaries
Hessian-based Analysis of Large Batch Training and Robustness to Adversaries
Z. Yao
A. Gholami
Qi Lei
Kurt Keutzer
Michael W. Mahoney
88
167
0
22 Feb 2018
The Secret Sharer: Evaluating and Testing Unintended Memorization in
  Neural Networks
The Secret Sharer: Evaluating and Testing Unintended Memorization in Neural Networks
Nicholas Carlini
Chang-rui Liu
Ulfar Erlingsson
Jernej Kos
Basel Alomair
177
1,151
0
22 Feb 2018
Improved Techniques For Weakly-Supervised Object Localization
Improved Techniques For Weakly-Supervised Object Localization
Junsuk Choe
J. Park
Hyunjung Shim
WSOL
62
7
0
22 Feb 2018
Previous
123...2829303132
Next