ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1609.04836
  4. Cited By
On Large-Batch Training for Deep Learning: Generalization Gap and Sharp
  Minima
v1v2 (latest)

On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima

15 September 2016
N. Keskar
Dheevatsa Mudigere
J. Nocedal
M. Smelyanskiy
P. T. P. Tang
    ODL
ArXiv (abs)PDFHTML

Papers citing "On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima"

50 / 1,653 papers shown
Title
Model compression via distillation and quantization
Model compression via distillation and quantization
A. Polino
Razvan Pascanu
Dan Alistarh
MQ
269
799
0
15 Feb 2018
A Progressive Batching L-BFGS Method for Machine Learning
A Progressive Batching L-BFGS Method for Machine Learning
Raghu Bollapragada
Dheevatsa Mudigere
J. Nocedal
Hao-Jun Michael Shi
P. T. P. Tang
ODL
198
164
0
15 Feb 2018
Input-Aware Auto-Tuning of Compute-Bound HPC Kernels
Input-Aware Auto-Tuning of Compute-Bound HPC Kernels
Philippe Tillet
David D. Cox
84
37
0
15 Feb 2018
Stronger generalization bounds for deep nets via a compression approach
Stronger generalization bounds for deep nets via a compression approach
Sanjeev Arora
Rong Ge
Behnam Neyshabur
Yi Zhang
MLTAI4CE
569
681
0
14 Feb 2018
A Diffusion Approximation Theory of Momentum SGD in Nonconvex
  Optimization
A Diffusion Approximation Theory of Momentum SGD in Nonconvex Optimization
Tianyi Liu
Zhehui Chen
Enlu Zhou
T. Zhao
192
14
0
14 Feb 2018
A Simple Proximal Stochastic Gradient Method for Nonsmooth Nonconvex
  Optimization
A Simple Proximal Stochastic Gradient Method for Nonsmooth Nonconvex Optimization
Zhize Li
Jian Li
278
125
0
13 Feb 2018
Classification of Things in DBpedia using Deep Neural Networks
Classification of Things in DBpedia using Deep Neural Networks
Rahul Parundekar
64
3
0
07 Feb 2018
Parameter Box: High Performance Parameter Servers for Efficient
  Distributed Deep Neural Network Training
Parameter Box: High Performance Parameter Servers for Efficient Distributed Deep Neural Network Training
Liangchen Luo
Jacob Nelson
Luis Ceze
Amar Phanishayee
Arvind Krishnamurthy
98
1
0
30 Jan 2018
On Scale-out Deep Learning Training for Cloud and HPC
On Scale-out Deep Learning Training for Cloud and HPC
Srinivas Sridharan
K. Vaidyanathan
Dhiraj D. Kalamkar
Dipankar Das
Mikhail E. Smorkalov
...
Dheevatsa Mudigere
Naveen Mellempudi
Sasikanth Avancha
Bharat Kaul
Pradeep Dubey
BDL
139
31
0
24 Jan 2018
Multi-pseudo Regularized Label for Generated Data in Person
  Re-Identification
Multi-pseudo Regularized Label for Generated Data in Person Re-Identification
Y. Huang
Jingsong Xu
Qiang Wu
Zhedong Zheng
Zhaoxiang Zhang
Jian Zhang
GAN
266
120
0
21 Jan 2018
MXNET-MPI: Embedding MPI parallelism in Parameter Server Task Model for
  scaling Deep Learning
MXNET-MPI: Embedding MPI parallelism in Parameter Server Task Model for scaling Deep Learning
Amith R. Mamidala
Georgios Kollias
C. Ward
F. Artico
139
21
0
11 Jan 2018
Theory of Deep Learning IIb: Optimization Properties of SGD
Theory of Deep Learning IIb: Optimization Properties of SGD
Chiyuan Zhang
Q. Liao
Alexander Rakhlin
Alycia Lee
Noah Golowich
T. Poggio
ODL
153
73
0
07 Jan 2018
The Multilinear Structure of ReLU Networks
The Multilinear Structure of ReLU NetworksInternational Conference on Machine Learning (ICML), 2017
T. Laurent
J. V. Brecht
174
53
0
29 Dec 2017
Visualizing the Loss Landscape of Neural Nets
Visualizing the Loss Landscape of Neural NetsNeural Information Processing Systems (NeurIPS), 2017
Hao Li
Zheng Xu
Gavin Taylor
Christoph Studer
Tom Goldstein
574
2,121
0
28 Dec 2017
Algorithmic Regularization in Over-parameterized Matrix Sensing and
  Neural Networks with Quadratic Activations
Algorithmic Regularization in Over-parameterized Matrix Sensing and Neural Networks with Quadratic Activations
Yuanzhi Li
Tengyu Ma
Hongyang R. Zhang
244
31
0
26 Dec 2017
Block-diagonal Hessian-free Optimization for Training Neural Networks
Block-diagonal Hessian-free Optimization for Training Neural Networks
Huishuai Zhang
Caiming Xiong
James Bradbury
R. Socher
ODL
116
24
0
20 Dec 2017
Continual Prediction of Notification Attendance with Classical and Deep
  Network Approaches
Continual Prediction of Notification Attendance with Classical and Deep Network Approaches
Kleomenis Katevas
Ilias Leontiadis
M. Pielot
Joan Serrà
80
2
0
19 Dec 2017
Parallel Complexity of Forward and Backward Propagation
Parallel Complexity of Forward and Backward Propagation
Maxim Naumov
153
8
0
18 Dec 2017
The exploding gradient problem demystified - definition, prevalence,
  impact, origin, tradeoffs, and solutions
The exploding gradient problem demystified - definition, prevalence, impact, origin, tradeoffs, and solutions
George Philipp
Basel Alomair
J. Carbonell
ODL
283
48
0
15 Dec 2017
Integrated Model, Batch and Domain Parallelism in Training Neural
  Networks
Integrated Model, Batch and Domain Parallelism in Training Neural Networks
A. Gholami
A. Azad
Peter H. Jin
Kurt Keutzer
A. Buluç
154
87
0
12 Dec 2017
Neumann Optimizer: A Practical Optimization Algorithm for Deep Neural
  Networks
Neumann Optimizer: A Practical Optimization Algorithm for Deep Neural Networks
Shankar Krishnan
Ying Xiao
Rif A. Saurous
ODL
146
20
0
08 Dec 2017
AdaBatch: Adaptive Batch Sizes for Training Deep Neural Networks
AdaBatch: Adaptive Batch Sizes for Training Deep Neural Networks
Aditya Devarakonda
Maxim Naumov
M. Garland
ODL
243
147
0
06 Dec 2017
Implicit Regularization in Nonconvex Statistical Estimation: Gradient
  Descent Converges Linearly for Phase Retrieval, Matrix Completion, and Blind
  Deconvolution
Implicit Regularization in Nonconvex Statistical Estimation: Gradient Descent Converges Linearly for Phase Retrieval, Matrix Completion, and Blind Deconvolution
Cong Ma
Kaizheng Wang
Yuejie Chi
Yuxin Chen
250
255
0
28 Nov 2017
Asymptotic Analysis via Stochastic Differential Equations of Gradient
  Descent Algorithms in Statistical and Computational Paradigms
Asymptotic Analysis via Stochastic Differential Equations of Gradient Descent Algorithms in Statistical and Computational Paradigms
Yazhen Wang
286
17
0
27 Nov 2017
Critical Learning Periods in Deep Neural Networks
Critical Learning Periods in Deep Neural Networks
Alessandro Achille
Matteo Rovere
Stefano Soatto
174
112
0
24 Nov 2017
Deep supervised learning using local errors
Deep supervised learning using local errors
Hesham Mostafa
V. Ramesh
Gert Cauwenberghs
151
124
0
17 Nov 2017
A Resizable Mini-batch Gradient Descent based on a Multi-Armed Bandit
A Resizable Mini-batch Gradient Descent based on a Multi-Armed Bandit
S. Cho
Sunghun Kang
Chang D. Yoo
167
1
0
17 Nov 2017
Decoupled Weight Decay Regularization
Decoupled Weight Decay Regularization
I. Loshchilov
Katharina Eggensperger
OffRL
498
2,476
0
14 Nov 2017
Three Factors Influencing Minima in SGD
Three Factors Influencing Minima in SGD
Stanislaw Jastrzebski
Zachary Kenton
Devansh Arpit
Nicolas Ballas
Asja Fischer
Yoshua Bengio
Amos Storkey
339
501
0
13 Nov 2017
Scale out for large minibatch SGD: Residual network training on
  ImageNet-1K with improved accuracy and reduced time to train
Scale out for large minibatch SGD: Residual network training on ImageNet-1K with improved accuracy and reduced time to train
V. Codreanu
Damian Podareanu
V. Saletore
184
57
0
12 Nov 2017
Meta-Learning by Adjusting Priors Based on Extended PAC-Bayes Theory
Meta-Learning by Adjusting Priors Based on Extended PAC-Bayes Theory
Ron Amit
Ron Meir
BDLMLT
518
181
0
03 Nov 2017
Efficient Training of Convolutional Neural Nets on Large Distributed
  Systems
Efficient Training of Convolutional Neural Nets on Large Distributed Systems
Sameer Kumar
D. Sreedhar
Vaibhav Saxena
Yogish Sabharwal
Ashish Verma
107
5
0
02 Nov 2017
Don't Decay the Learning Rate, Increase the Batch Size
Don't Decay the Learning Rate, Increase the Batch Size
Samuel L. Smith
Pieter-Jan Kindermans
Chris Ying
Quoc V. Le
ODL
600
1,078
0
01 Nov 2017
Deep Learning as a Mixed Convex-Combinatorial Optimization Problem
Deep Learning as a Mixed Convex-Combinatorial Optimization Problem
A. Friesen
Pedro M. Domingos
146
20
0
31 Oct 2017
Stochastic gradient descent performs variational inference, converges to
  limit cycles for deep networks
Stochastic gradient descent performs variational inference, converges to limit cycles for deep networks
Pratik Chaudhari
Stefano Soatto
MLT
220
314
0
30 Oct 2017
The Implicit Bias of Gradient Descent on Separable Data
The Implicit Bias of Gradient Descent on Separable DataJournal of machine learning research (JMLR), 2017
Daniel Soudry
Elad Hoffer
Mor Shpigel Nacson
Suriya Gunasekar
Nathan Srebro
834
1,001
0
27 Oct 2017
Rethinking generalization requires revisiting old ideas: statistical
  mechanics approaches and complex learning behavior
Rethinking generalization requires revisiting old ideas: statistical mechanics approaches and complex learning behavior
Charles H. Martin
Michael W. Mahoney
AI4CE
145
64
0
26 Oct 2017
Stability and Generalization of Learning Algorithms that Converge to
  Global Optima
Stability and Generalization of Learning Algorithms that Converge to Global OptimaInternational Conference on Machine Learning (ICML), 2017
Zachary B. Charles
Dimitris Papailiopoulos
MLT
167
175
0
23 Oct 2017
Function Norms and Regularization in Deep Networks
Function Norms and Regularization in Deep Networks
Amal Rannen Triki
Maxim Berman
Matthew B. Blaschko
172
2
0
18 Oct 2017
A Bayesian Perspective on Generalization and Stochastic Gradient Descent
A Bayesian Perspective on Generalization and Stochastic Gradient Descent
Samuel L. Smith
Quoc V. Le
BDL
296
277
0
17 Oct 2017
Searching for Activation Functions
Searching for Activation Functions
Prajit Ramachandran
Barret Zoph
Quoc V. Le
187
670
0
16 Oct 2017
Generalization in Deep Learning
Generalization in Deep Learning
Kenji Kawaguchi
L. Kaelbling
Yoshua Bengio
ODL
632
488
0
16 Oct 2017
AdaDNNs: Adaptive Ensemble of Deep Neural Networks for Scene Text
  Recognition
AdaDNNs: Adaptive Ensemble of Deep Neural Networks for Scene Text Recognition
Chun Yang
Xu-Cheng Yin
Zejun Li
Jianwei Wu
Chunchao Guo
Hongfa Wang
Lei Xiao
122
10
0
10 Oct 2017
SGD for robot motion? The effectiveness of stochastic optimization on a
  new benchmark for biped locomotion tasks
SGD for robot motion? The effectiveness of stochastic optimization on a new benchmark for biped locomotion tasks
Martim Brandao
K. Hashimoto
A. Takanishi
90
6
0
09 Oct 2017
Neural Optimizer Search with Reinforcement Learning
Neural Optimizer Search with Reinforcement Learning
Irwan Bello
Barret Zoph
Vijay Vasudevan
Quoc V. Le
ODL
206
400
0
21 Sep 2017
ImageNet Training in Minutes
ImageNet Training in Minutes
Yang You
Zhao-jie Zhang
Cho-Jui Hsieh
J. Demmel
Kurt Keutzer
VLMLRM
274
60
0
14 Sep 2017
The Impact of Local Geometry and Batch Size on Stochastic Gradient
  Descent for Nonconvex Problems
The Impact of Local Geometry and Batch Size on Stochastic Gradient Descent for Nonconvex Problems
V. Patel
MLT
92
8
0
14 Sep 2017
Normalized Direction-preserving Adam
Normalized Direction-preserving Adam
Zijun Zhang
Lin Ma
Zongpeng Li
Chuan Wu
ODL
175
30
0
13 Sep 2017
Parallelizing Linear Recurrent Neural Nets Over Sequence Length
Parallelizing Linear Recurrent Neural Nets Over Sequence Length
Eric Martin
Chris Cundy
242
145
0
12 Sep 2017
Implicit Regularization in Deep Learning
Implicit Regularization in Deep Learning
Behnam Neyshabur
315
158
0
06 Sep 2017
Previous
123...31323334
Next