ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1609.04836
  4. Cited By
On Large-Batch Training for Deep Learning: Generalization Gap and Sharp
  Minima
v1v2 (latest)

On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima

15 September 2016
N. Keskar
Dheevatsa Mudigere
J. Nocedal
M. Smelyanskiy
P. T. P. Tang
    ODL
ArXiv (abs)PDFHTML

Papers citing "On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima"

50 / 1,554 papers shown
Title
Extending the step-size restriction for gradient descent to avoid strict
  saddle points
Extending the step-size restriction for gradient descent to avoid strict saddle points
Hayden Schaeffer
S. McCalla
107
4
0
05 Aug 2019
Calibrating the Adaptive Learning Rate to Improve Convergence of ADAM
Calibrating the Adaptive Learning Rate to Improve Convergence of ADAM
Qianqian Tong
Guannan Liang
J. Bi
105
7
0
02 Aug 2019
Optimizing Multi-GPU Parallelization Strategies for Deep Learning
  Training
Optimizing Multi-GPU Parallelization Strategies for Deep Learning Training
Saptadeep Pal
Eiman Ebrahimi
A. Zulfiqar
Yaosheng Fu
Victor Zhang
Szymon Migacz
D. Nellans
Puneet Gupta
90
59
0
30 Jul 2019
Taming Momentum in a Distributed Asynchronous Environment
Taming Momentum in a Distributed Asynchronous Environment
Ido Hakimi
Saar Barkai
Moshe Gabel
Assaf Schuster
93
23
0
26 Jul 2019
Hessian based analysis of SGD for Deep Nets: Dynamics and Generalization
Hessian based analysis of SGD for Deep Nets: Dynamics and Generalization
Xinyan Li
Qilong Gu
Yingxue Zhou
Tiancong Chen
A. Banerjee
ODL
88
52
0
24 Jul 2019
BPPSA: Scaling Back-propagation by Parallel Scan Algorithm
BPPSA: Scaling Back-propagation by Parallel Scan Algorithm
Shang Wang
Yifan Bai
Gennady Pekhimenko
60
7
0
23 Jul 2019
Spectral Analysis of Latent Representations
Spectral Analysis of Latent Representations
Justin Shenk
Mats L. Richter
Anders Arpteg
Mikael Huss
FAtt
23
6
0
19 Jul 2019
Towards Understanding Generalization in Gradient-Based Meta-Learning
Towards Understanding Generalization in Gradient-Based Meta-Learning
Simon Guiroy
Vikas Verma
C. Pal
73
21
0
16 Jul 2019
Single-bit-per-weight deep convolutional neural networks without
  batch-normalization layers for embedded systems
Single-bit-per-weight deep convolutional neural networks without batch-normalization layers for embedded systems
Mark D Mcdonnell
Hesham Mostafa
Runchun Wang
Andre van Schaik
MQ
44
2
0
16 Jul 2019
Learning Neural Networks with Adaptive Regularization
Learning Neural Networks with Adaptive Regularization
Han Zhao
Yao-Hung Hubert Tsai
Ruslan Salakhutdinov
Geoffrey J. Gordon
42
15
0
14 Jul 2019
Towards Explaining the Regularization Effect of Initial Large Learning
  Rate in Training Neural Networks
Towards Explaining the Regularization Effect of Initial Large Learning Rate in Training Neural Networks
Yuanzhi Li
Colin Wei
Tengyu Ma
90
299
0
10 Jul 2019
Etalumis: Bringing Probabilistic Programming to Scientific Simulators at
  Scale
Etalumis: Bringing Probabilistic Programming to Scientific Simulators at Scale
A. G. Baydin
Lei Shao
W. Bhimji
Lukas Heinrich
Lawrence Meadows
...
Philip Torr
Victor W. Lee
Kyle Cranmer
P. Prabhat
Frank Wood
73
58
0
08 Jul 2019
Stochastic Gradient and Langevin Processes
Stochastic Gradient and Langevin Processes
Xiang Cheng
Dong Yin
Peter L. Bartlett
Michael I. Jordan
64
5
0
07 Jul 2019
Time-to-Event Prediction with Neural Networks and Cox Regression
Time-to-Event Prediction with Neural Networks and Cox Regression
Håvard Kvamme
Ørnulf Borgan
Ida Scheel
383
337
0
01 Jul 2019
Deep Gamblers: Learning to Abstain with Portfolio Theory
Deep Gamblers: Learning to Abstain with Portfolio Theory
Liu Ziyin
Zhikang T. Wang
Paul Pu Liang
Ruslan Salakhutdinov
Louis-Philippe Morency
Masahito Ueda
109
113
0
29 Jun 2019
On improving deep learning generalization with adaptive sparse
  connectivity
On improving deep learning generalization with adaptive sparse connectivity
Shiwei Liu
Decebal Constantin Mocanu
Mykola Pechenizkiy
ODL
39
8
0
27 Jun 2019
Gradient Noise Convolution (GNC): Smoothing Loss Function for
  Distributed Large-Batch SGD
Gradient Noise Convolution (GNC): Smoothing Loss Function for Distributed Large-Batch SGD
Kosuke Haruki
Taiji Suzuki
Yohei Hamakawa
Takeshi Toda
Ryuji Sakai
M. Ozawa
Mitsuhiro Kimura
ODL
61
17
0
26 Jun 2019
The Difficulty of Training Sparse Neural Networks
The Difficulty of Training Sparse Neural Networks
Utku Evci
Fabian Pedregosa
Aidan Gomez
Erich Elsen
72
101
0
25 Jun 2019
Is It Worth the Attention? A Comparative Evaluation of Attention Layers
  for Argument Unit Segmentation
Is It Worth the Attention? A Comparative Evaluation of Attention Layers for Argument Unit Segmentation
Maximilian Spliethover
Jonas Klaff
Hendrik Heuer
43
10
0
24 Jun 2019
First Exit Time Analysis of Stochastic Gradient Descent Under
  Heavy-Tailed Gradient Noise
First Exit Time Analysis of Stochastic Gradient Descent Under Heavy-Tailed Gradient Noise
T. H. Nguyen
Umut Simsekli
Mert Gurbuzbalaban
G. Richard
79
65
0
21 Jun 2019
On the interplay between noise and curvature and its effect on
  optimization and generalization
On the interplay between noise and curvature and its effect on optimization and generalization
Valentin Thomas
Fabian Pedregosa
B. V. Merrienboer
Pierre-Antoine Mangazol
Yoshua Bengio
Nicolas Le Roux
52
61
0
18 Jun 2019
On the Noisy Gradient Descent that Generalizes as SGD
On the Noisy Gradient Descent that Generalizes as SGD
Jingfeng Wu
Wenqing Hu
Haoyi Xiong
Jun Huan
Vladimir Braverman
Zhanxing Zhu
MLT
70
10
0
18 Jun 2019
A Survey of Optimization Methods from a Machine Learning Perspective
A Survey of Optimization Methods from a Machine Learning Perspective
Shiliang Sun
Zehui Cao
Han Zhu
Jing Zhao
82
562
0
17 Jun 2019
Finding the Needle in the Haystack with Convolutions: on the benefits of
  architectural bias
Finding the Needle in the Haystack with Convolutions: on the benefits of architectural bias
Stéphane dÁscoli
Levent Sagun
Joan Bruna
Giulio Biroli
85
37
0
16 Jun 2019
Learning to Forget for Meta-Learning
Learning to Forget for Meta-Learning
Sungyong Baik
Seokil Hong
Kyoung Mu Lee
CLLKELM
75
89
0
13 Jun 2019
Generalization Guarantees for Neural Networks via Harnessing the
  Low-rank Structure of the Jacobian
Generalization Guarantees for Neural Networks via Harnessing the Low-rank Structure of the Jacobian
Samet Oymak
Zalan Fabian
Mingchen Li
Mahdi Soltanolkotabi
MLT
87
88
0
12 Jun 2019
Semi-flat minima and saddle points by embedding neural networks to
  overparameterization
Semi-flat minima and saddle points by embedding neural networks to overparameterization
Kenji Fukumizu
Shoichiro Yamaguchi
Yoh-ichi Mototake
Mirai Tanaka
3DPC
64
25
0
12 Jun 2019
Large Scale Structure of Neural Network Loss Landscapes
Large Scale Structure of Neural Network Loss Landscapes
Stanislav Fort
Stanislaw Jastrzebski
72
84
0
11 Jun 2019
The Generalization-Stability Tradeoff In Neural Network Pruning
The Generalization-Stability Tradeoff In Neural Network Pruning
Brian Bartoldson
Ari S. Morcos
Adrian Barbu
G. Erlebacher
94
76
0
09 Jun 2019
The Implicit Bias of AdaGrad on Separable Data
The Implicit Bias of AdaGrad on Separable Data
Qian Qian
Xiaoyuan Qian
70
23
0
09 Jun 2019
Understanding Generalization through Visualizations
Understanding Generalization through Visualizations
Wenjie Huang
Z. Emam
Micah Goldblum
Liam H. Fowl
J. K. Terry
Furong Huang
Tom Goldstein
AI4CE
51
80
0
07 Jun 2019
Inductive Bias of Gradient Descent based Adversarial Training on
  Separable Data
Inductive Bias of Gradient Descent based Adversarial Training on Separable Data
Yan Li
Ethan X. Fang
Huan Xu
T. Zhao
78
16
0
07 Jun 2019
The Normalization Method for Alleviating Pathological Sharpness in Wide
  Neural Networks
The Normalization Method for Alleviating Pathological Sharpness in Wide Neural Networks
Ryo Karakida
S. Akaho
S. Amari
73
41
0
07 Jun 2019
Fault Diagnosis of Rotary Machines using Deep Convolutional Neural Network with three axis signal input
Davor Kolar
D. Lisjak
M. Pająk
D. Pavković
24
0
0
06 Jun 2019
On the Convergence of SARAH and Beyond
On the Convergence of SARAH and Beyond
Bingcong Li
Meng Ma
G. Giannakis
68
27
0
05 Jun 2019
How to Initialize your Network? Robust Initialization for WeightNorm &
  ResNets
How to Initialize your Network? Robust Initialization for WeightNorm & ResNets
Devansh Arpit
Victor Campos
Yoshua Bengio
75
59
0
05 Jun 2019
Deep Q-Learning for Directed Acyclic Graph Generation
Deep Q-Learning for Directed Acyclic Graph Generation
Laura DÁrcy
P. Corcoran
Alun D. Preece
BDLGNN
26
5
0
05 Jun 2019
An Empirical Study on Hyperparameters and their Interdependence for RL
  Generalization
An Empirical Study on Hyperparameters and their Interdependence for RL Generalization
Xingyou Song
Yilun Du
Jacob Jackson
AI4CE
43
8
0
02 Jun 2019
Implicit Regularization in Deep Matrix Factorization
Implicit Regularization in Deep Matrix Factorization
Sanjeev Arora
Nadav Cohen
Wei Hu
Yuping Luo
AI4CE
111
509
0
31 May 2019
Luck Matters: Understanding Training Dynamics of Deep ReLU Networks
Luck Matters: Understanding Training Dynamics of Deep ReLU Networks
Yuandong Tian
Tina Jiang
Qucheng Gong
Ari S. Morcos
169
25
0
31 May 2019
Deterministic PAC-Bayesian generalization bounds for deep networks via
  generalizing noise-resilience
Deterministic PAC-Bayesian generalization bounds for deep networks via generalizing noise-resilience
Vaishnavh Nagarajan
J. Zico Kolter
102
101
0
30 May 2019
Time Matters in Regularizing Deep Networks: Weight Decay and Data
  Augmentation Affect Early Learning Dynamics, Matter Little Near Convergence
Time Matters in Regularizing Deep Networks: Weight Decay and Data Augmentation Affect Early Learning Dynamics, Matter Little Near Convergence
Aditya Golatkar
Alessandro Achille
Stefano Soatto
80
97
0
30 May 2019
Meta Dropout: Learning to Perturb Features for Generalization
Meta Dropout: Learning to Perturb Features for Generalization
Haebeom Lee
Taewook Nam
Eunho Yang
Sung Ju Hwang
OOD
59
3
0
30 May 2019
Mixed Precision Training With 8-bit Floating Point
Mixed Precision Training With 8-bit Floating Point
Naveen Mellempudi
Sudarshan Srinivasan
Dipankar Das
Bharat Kaul
MQ
78
69
0
29 May 2019
Where is the Information in a Deep Neural Network?
Where is the Information in a Deep Neural Network?
Alessandro Achille
Giovanni Paolini
Stefano Soatto
85
82
0
29 May 2019
High Frequency Component Helps Explain the Generalization of
  Convolutional Neural Networks
High Frequency Component Helps Explain the Generalization of Convolutional Neural Networks
Haohan Wang
Xindi Wu
Pengcheng Yin
Eric Xing
77
526
0
28 May 2019
Gram-Gauss-Newton Method: Learning Overparameterized Neural Networks for
  Regression Problems
Gram-Gauss-Newton Method: Learning Overparameterized Neural Networks for Regression Problems
Tianle Cai
Ruiqi Gao
Jikai Hou
Siyu Chen
Dong Wang
Di He
Zhihua Zhang
Liwei Wang
ODL
67
57
0
28 May 2019
SGD on Neural Networks Learns Functions of Increasing Complexity
SGD on Neural Networks Learns Functions of Increasing Complexity
Preetum Nakkiran
Gal Kaplun
Dimitris Kalimeris
Tristan Yang
Benjamin L. Edelman
Fred Zhang
Boaz Barak
MLT
140
248
0
28 May 2019
Quantifying the generalization error in deep learning in terms of data
  distribution and neural network smoothness
Quantifying the generalization error in deep learning in terms of data distribution and neural network smoothness
Pengzhan Jin
Lu Lu
Yifa Tang
George Karniadakis
65
60
0
27 May 2019
Nonparametric Online Learning Using Lipschitz Regularized Deep Neural
  Networks
Nonparametric Online Learning Using Lipschitz Regularized Deep Neural Networks
Guy Uziel
BDL
44
0
0
26 May 2019
Previous
123...242526...303132
Next