ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1609.04836
  4. Cited By
On Large-Batch Training for Deep Learning: Generalization Gap and Sharp
  Minima
v1v2 (latest)

On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima

15 September 2016
N. Keskar
Dheevatsa Mudigere
J. Nocedal
M. Smelyanskiy
P. T. P. Tang
    ODL
ArXiv (abs)PDFHTML

Papers citing "On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima"

50 / 1,554 papers shown
Title
An Alternative View: When Does SGD Escape Local Minima?
An Alternative View: When Does SGD Escape Local Minima?
Robert D. Kleinberg
Yuanzhi Li
Yang Yuan
MLT
91
317
0
17 Feb 2018
Model compression via distillation and quantization
Model compression via distillation and quantization
A. Polino
Razvan Pascanu
Dan Alistarh
MQ
88
733
0
15 Feb 2018
A Progressive Batching L-BFGS Method for Machine Learning
A Progressive Batching L-BFGS Method for Machine Learning
Raghu Bollapragada
Dheevatsa Mudigere
J. Nocedal
Hao-Jun Michael Shi
P. T. P. Tang
ODL
109
153
0
15 Feb 2018
Input-Aware Auto-Tuning of Compute-Bound HPC Kernels
Input-Aware Auto-Tuning of Compute-Bound HPC Kernels
Philippe Tillet
David D. Cox
48
36
0
15 Feb 2018
Stronger generalization bounds for deep nets via a compression approach
Stronger generalization bounds for deep nets via a compression approach
Sanjeev Arora
Rong Ge
Behnam Neyshabur
Yi Zhang
MLTAI4CE
122
643
0
14 Feb 2018
A Diffusion Approximation Theory of Momentum SGD in Nonconvex
  Optimization
A Diffusion Approximation Theory of Momentum SGD in Nonconvex Optimization
Tianyi Liu
Zhehui Chen
Enlu Zhou
T. Zhao
87
14
0
14 Feb 2018
A Simple Proximal Stochastic Gradient Method for Nonsmooth Nonconvex
  Optimization
A Simple Proximal Stochastic Gradient Method for Nonsmooth Nonconvex Optimization
Zhize Li
Jian Li
97
116
0
13 Feb 2018
Classification of Things in DBpedia using Deep Neural Networks
Classification of Things in DBpedia using Deep Neural Networks
Rahul Parundekar
40
2
0
07 Feb 2018
Parameter Box: High Performance Parameter Servers for Efficient
  Distributed Deep Neural Network Training
Parameter Box: High Performance Parameter Servers for Efficient Distributed Deep Neural Network Training
Liangchen Luo
Jacob Nelson
Luis Ceze
Amar Phanishayee
Arvind Krishnamurthy
52
1
0
30 Jan 2018
On Scale-out Deep Learning Training for Cloud and HPC
On Scale-out Deep Learning Training for Cloud and HPC
Srinivas Sridharan
K. Vaidyanathan
Dhiraj D. Kalamkar
Dipankar Das
Mikhail E. Smorkalov
...
Dheevatsa Mudigere
Naveen Mellempudi
Sasikanth Avancha
Bharat Kaul
Pradeep Dubey
BDL
62
30
0
24 Jan 2018
Multi-pseudo Regularized Label for Generated Data in Person
  Re-Identification
Multi-pseudo Regularized Label for Generated Data in Person Re-Identification
Y. Huang
Jingsong Xu
Qiang Wu
Zhedong Zheng
Zhaoxiang Zhang
Jian Zhang
GAN
121
114
0
21 Jan 2018
MXNET-MPI: Embedding MPI parallelism in Parameter Server Task Model for
  scaling Deep Learning
MXNET-MPI: Embedding MPI parallelism in Parameter Server Task Model for scaling Deep Learning
Amith R. Mamidala
Georgios Kollias
C. Ward
F. Artico
75
20
0
11 Jan 2018
Theory of Deep Learning IIb: Optimization Properties of SGD
Theory of Deep Learning IIb: Optimization Properties of SGD
Chiyuan Zhang
Q. Liao
Alexander Rakhlin
Brando Miranda
Noah Golowich
T. Poggio
ODL
75
71
0
07 Jan 2018
The Multilinear Structure of ReLU Networks
The Multilinear Structure of ReLU Networks
T. Laurent
J. V. Brecht
92
51
0
29 Dec 2017
Visualizing the Loss Landscape of Neural Nets
Visualizing the Loss Landscape of Neural Nets
Hao Li
Zheng Xu
Gavin Taylor
Christoph Studer
Tom Goldstein
272
1,901
0
28 Dec 2017
Algorithmic Regularization in Over-parameterized Matrix Sensing and
  Neural Networks with Quadratic Activations
Algorithmic Regularization in Over-parameterized Matrix Sensing and Neural Networks with Quadratic Activations
Yuanzhi Li
Tengyu Ma
Hongyang R. Zhang
74
31
0
26 Dec 2017
Block-diagonal Hessian-free Optimization for Training Neural Networks
Block-diagonal Hessian-free Optimization for Training Neural Networks
Huishuai Zhang
Caiming Xiong
James Bradbury
R. Socher
ODL
52
22
0
20 Dec 2017
Continual Prediction of Notification Attendance with Classical and Deep
  Network Approaches
Continual Prediction of Notification Attendance with Classical and Deep Network Approaches
Kleomenis Katevas
Ilias Leontiadis
M. Pielot
Joan Serrà
19
2
0
19 Dec 2017
Parallel Complexity of Forward and Backward Propagation
Parallel Complexity of Forward and Backward Propagation
Maxim Naumov
42
8
0
18 Dec 2017
The exploding gradient problem demystified - definition, prevalence,
  impact, origin, tradeoffs, and solutions
The exploding gradient problem demystified - definition, prevalence, impact, origin, tradeoffs, and solutions
George Philipp
Basel Alomair
J. Carbonell
ODL
92
46
0
15 Dec 2017
Integrated Model, Batch and Domain Parallelism in Training Neural
  Networks
Integrated Model, Batch and Domain Parallelism in Training Neural Networks
A. Gholami
A. Azad
Peter H. Jin
Kurt Keutzer
A. Buluç
81
84
0
12 Dec 2017
Neumann Optimizer: A Practical Optimization Algorithm for Deep Neural
  Networks
Neumann Optimizer: A Practical Optimization Algorithm for Deep Neural Networks
Shankar Krishnan
Ying Xiao
Rif A. Saurous
ODL
45
20
0
08 Dec 2017
AdaBatch: Adaptive Batch Sizes for Training Deep Neural Networks
AdaBatch: Adaptive Batch Sizes for Training Deep Neural Networks
Aditya Devarakonda
Maxim Naumov
M. Garland
ODL
107
136
0
06 Dec 2017
Implicit Regularization in Nonconvex Statistical Estimation: Gradient
  Descent Converges Linearly for Phase Retrieval, Matrix Completion, and Blind
  Deconvolution
Implicit Regularization in Nonconvex Statistical Estimation: Gradient Descent Converges Linearly for Phase Retrieval, Matrix Completion, and Blind Deconvolution
Cong Ma
Kaizheng Wang
Yuejie Chi
Yuxin Chen
125
241
0
28 Nov 2017
Asymptotic Analysis via Stochastic Differential Equations of Gradient
  Descent Algorithms in Statistical and Computational Paradigms
Asymptotic Analysis via Stochastic Differential Equations of Gradient Descent Algorithms in Statistical and Computational Paradigms
Yazhen Wang
54
17
0
27 Nov 2017
Critical Learning Periods in Deep Neural Networks
Critical Learning Periods in Deep Neural Networks
Alessandro Achille
Matteo Rovere
Stefano Soatto
72
100
0
24 Nov 2017
Deep supervised learning using local errors
Deep supervised learning using local errors
Hesham Mostafa
V. Ramesh
Gert Cauwenberghs
68
115
0
17 Nov 2017
A Resizable Mini-batch Gradient Descent based on a Multi-Armed Bandit
A Resizable Mini-batch Gradient Descent based on a Multi-Armed Bandit
S. Cho
Sunghun Kang
Chang D. Yoo
79
1
0
17 Nov 2017
Decoupled Weight Decay Regularization
Decoupled Weight Decay Regularization
I. Loshchilov
Frank Hutter
OffRL
158
2,161
0
14 Nov 2017
Three Factors Influencing Minima in SGD
Three Factors Influencing Minima in SGD
Stanislaw Jastrzebski
Zachary Kenton
Devansh Arpit
Nicolas Ballas
Asja Fischer
Yoshua Bengio
Amos Storkey
85
463
0
13 Nov 2017
Scale out for large minibatch SGD: Residual network training on
  ImageNet-1K with improved accuracy and reduced time to train
Scale out for large minibatch SGD: Residual network training on ImageNet-1K with improved accuracy and reduced time to train
V. Codreanu
Damian Podareanu
V. Saletore
63
55
0
12 Nov 2017
Meta-Learning by Adjusting Priors Based on Extended PAC-Bayes Theory
Meta-Learning by Adjusting Priors Based on Extended PAC-Bayes Theory
Ron Amit
Ron Meir
BDLMLT
73
176
0
03 Nov 2017
Efficient Training of Convolutional Neural Nets on Large Distributed
  Systems
Efficient Training of Convolutional Neural Nets on Large Distributed Systems
Sameer Kumar
D. Sreedhar
Vaibhav Saxena
Yogish Sabharwal
Ashish Verma
58
4
0
02 Nov 2017
Don't Decay the Learning Rate, Increase the Batch Size
Don't Decay the Learning Rate, Increase the Batch Size
Samuel L. Smith
Pieter-Jan Kindermans
Chris Ying
Quoc V. Le
ODL
127
996
0
01 Nov 2017
Deep Learning as a Mixed Convex-Combinatorial Optimization Problem
Deep Learning as a Mixed Convex-Combinatorial Optimization Problem
A. Friesen
Pedro M. Domingos
46
20
0
31 Oct 2017
Stochastic gradient descent performs variational inference, converges to
  limit cycles for deep networks
Stochastic gradient descent performs variational inference, converges to limit cycles for deep networks
Pratik Chaudhari
Stefano Soatto
MLT
88
304
0
30 Oct 2017
The Implicit Bias of Gradient Descent on Separable Data
The Implicit Bias of Gradient Descent on Separable Data
Daniel Soudry
Elad Hoffer
Mor Shpigel Nacson
Suriya Gunasekar
Nathan Srebro
208
924
0
27 Oct 2017
Rethinking generalization requires revisiting old ideas: statistical
  mechanics approaches and complex learning behavior
Rethinking generalization requires revisiting old ideas: statistical mechanics approaches and complex learning behavior
Charles H. Martin
Michael W. Mahoney
AI4CE
74
64
0
26 Oct 2017
Stability and Generalization of Learning Algorithms that Converge to
  Global Optima
Stability and Generalization of Learning Algorithms that Converge to Global Optima
Zachary B. Charles
Dimitris Papailiopoulos
MLT
57
163
0
23 Oct 2017
Function Norms and Regularization in Deep Networks
Function Norms and Regularization in Deep Networks
Amal Rannen Triki
Maxim Berman
Matthew B. Blaschko
45
2
0
18 Oct 2017
A Bayesian Perspective on Generalization and Stochastic Gradient Descent
A Bayesian Perspective on Generalization and Stochastic Gradient Descent
Samuel L. Smith
Quoc V. Le
BDL
104
253
0
17 Oct 2017
Searching for Activation Functions
Searching for Activation Functions
Prajit Ramachandran
Barret Zoph
Quoc V. Le
97
612
0
16 Oct 2017
Generalization in Deep Learning
Generalization in Deep Learning
Kenji Kawaguchi
L. Kaelbling
Yoshua Bengio
ODL
164
459
0
16 Oct 2017
AdaDNNs: Adaptive Ensemble of Deep Neural Networks for Scene Text
  Recognition
AdaDNNs: Adaptive Ensemble of Deep Neural Networks for Scene Text Recognition
Chun Yang
Xu-Cheng Yin
Zejun Li
Jianwei Wu
Chunchao Guo
Hongfa Wang
Lei Xiao
41
10
0
10 Oct 2017
SGD for robot motion? The effectiveness of stochastic optimization on a
  new benchmark for biped locomotion tasks
SGD for robot motion? The effectiveness of stochastic optimization on a new benchmark for biped locomotion tasks
Martim Brandao
K. Hashimoto
A. Takanishi
55
6
0
09 Oct 2017
Neural Optimizer Search with Reinforcement Learning
Neural Optimizer Search with Reinforcement Learning
Irwan Bello
Barret Zoph
Vijay Vasudevan
Quoc V. Le
ODL
88
386
0
21 Sep 2017
ImageNet Training in Minutes
ImageNet Training in Minutes
Yang You
Zhao-jie Zhang
Cho-Jui Hsieh
J. Demmel
Kurt Keutzer
VLMLRM
132
57
0
14 Sep 2017
The Impact of Local Geometry and Batch Size on Stochastic Gradient
  Descent for Nonconvex Problems
The Impact of Local Geometry and Batch Size on Stochastic Gradient Descent for Nonconvex Problems
V. Patel
MLT
61
8
0
14 Sep 2017
Normalized Direction-preserving Adam
Normalized Direction-preserving Adam
Zijun Zhang
Lin Ma
Zongpeng Li
Chuan Wu
ODL
78
29
0
13 Sep 2017
Parallelizing Linear Recurrent Neural Nets Over Sequence Length
Parallelizing Linear Recurrent Neural Nets Over Sequence Length
Eric Martin
Chris Cundy
116
104
0
12 Sep 2017
Previous
123...29303132
Next