v1v2 (latest)

Train longer, generalize better: closing the generalization gap in large batch training of neural networks

24 May 2017

Papers citing "Train longer, generalize better: closing the generalization gap in large batch training of neural networks"

50 / 465 papers shown

An Empirical Study of Large-Batch Stochastic Gradient Descent with Structured Covariance Noise

Jimmy Ba

332

21 Feb 2019

Random Search and Reproducibility for Neural Architecture Search

Liam Li

Ameet Talwalkar

OOD

466

778

20 Feb 2019

Uniform convergence may be unable to explain generalization in deep learningNeural Information Processing Systems (NeurIPS), 2019

Vaishnavh Nagarajan

J. Zico Kolter

MoMe AI4CE

434

336

13 Feb 2019

Asymmetric Valleys: Beyond Sharp and Flat Local MinimaNeural Information Processing Systems (NeurIPS), 2019

Haowei He

Gao Huang

Yang Yuan

ODL MLT

271

158

02 Feb 2019

Compressing Gradient Optimizers via Count-SketchesInternational Conference on Machine Learning (ICML), 2019

Ryan Spring

Anastasios Kyrillidis

Vijai Mohan

Anshumali Shrivastava

151

01 Feb 2019

Augment your batch: better training with larger batches

207

27 Jan 2019

Traditional and Heavy-Tailed Self Regularization in Neural Network Models

Charles H. Martin

Michael W. Mahoney

297

145

24 Jan 2019

Large-Batch Training for LSTM and Beyond

Yang You

220

24 Jan 2019

Measurements of Three-Level Hierarchical Structure in the Outliers in the Spectrum of Deepnet Hessians

Vardan Papyan

174

24 Jan 2019

A Tail-Index Analysis of Stochastic Gradient Noise in Deep Neural Networks

Umut Simsekli

Levent Sagun

Mert Gurbuzbalaban

499

288

18 Jan 2019

Normalized Flat Minima: Exploring Scale Invariant Definition of Flat Minima for Neural Networks using PAC-Bayesian Analysis

Yusuke Tsuzuku

Issei Sato

Masashi Sugiyama

255

15 Jan 2019

CROSSBOW: Scaling Deep Learning with Small Batch Sizes on Multi-GPU Servers

A. Koliousis

Pijika Watcharapichat

215

08 Jan 2019

Generalization in Deep Networks: The Role of Distance from Initialization

Vaishnavh Nagarajan

J. Zico Kolter

ODL

194

07 Jan 2019

Scaling description of generalization with number of parameters in deep learning

353

204

06 Jan 2019

A continuous-time analysis of distributed stochastic gradient

Nicholas M. Boffi

Jean-Jacques E. Slotine

266

28 Dec 2018

NIPS - Not Even Wrong? A Systematic Review of Empirically Complete Demonstrations of Algorithmic Effectiveness in the Machine Learning and Artificial Intelligence Literature

Franz J. Király

Bilal A. Mateen

R. Sonabend

199

18 Dec 2018

An Empirical Model of Large-Batch Training

893

355

14 Dec 2018

Nonlinear Conjugate Gradients For Scaling Synchronous Distributed DNN Training

Saurabh N. Adya

Vinay Palakkode

Oncel Tuzel

105

07 Dec 2018

Towards Theoretical Understanding of Large Batch Training in Stochastic Gradient Descent

Xiaowu Dai

Yuhua Zhu

139

03 Dec 2018

Stochastic Training of Residual Networks: a Differential Equation Viewpoint

Qi Sun

Yunzhe Tao

Q. Du

163

01 Dec 2018

On the Computational Inefficiency of Large Batch Sizes for Stochastic Gradient Descent

191

30 Nov 2018

LEARN Codes: Inventing Low-latency Codes via Recurrent Neural Networks

Pramod Viswanath

225

30 Nov 2018

Large-Scale Distributed Second-Order Optimization Using Kronecker-Factored Approximate Curvature for Deep Convolutional Neural Networks

328

29 Nov 2018

Deep learning for pedestrians: backpropagation in CNNs

L. Boué

3DV PINN

141

29 Nov 2018

Neural Sign Language Translation based on Human Keypoint Estimation

200

230

28 Nov 2018

Deep Frank-Wolfe For Neural Network OptimizationInternational Conference on Learning Representations (ICLR), 2018

195

19 Nov 2018

Image Classification at Supercomputer Scale

180

126

16 Nov 2018

Massively Distributed SGD: ImageNet/ResNet-50 Training in a Flash

Hiroaki Mikami

Hisahiro Suganuma

Pongsakorn U-chupala

Yoshiki Tanaka

Yuichi Kageyama

178

13 Nov 2018

Measuring the Effects of Data Parallelism on Neural Network TrainingJournal of machine learning research (JMLR), 2018

Christopher J. Shallue

550

452

08 Nov 2018

A Closer Look at Deep Learning Heuristics: Learning rate restarts, Warmup and Distillation

Akhilesh Deepak Gotmare

251

302

29 Oct 2018

Three Mechanisms of Weight Decay Regularization

204

277

29 Oct 2018

A jamming transition from under- to over-parametrization affects loss landscape and generalization

392

160

22 Oct 2018

A Closer Look at Structured Pruning for Neural Network Compression

Elliot J. Crowley

Jack Turner

Amos Storkey

Michael F. P. O'Boyle

3DPC

190

10 Oct 2018

Learning to Segment Inputs for NMT Favors Character-Level Processing

Julia Kreutzer

Artem Sokolov

235

02 Oct 2018

Implicit Self-Regularization in Deep Neural Networks: Evidence from Random Matrix Theory and Implications for Learning

Charles H. Martin

Michael W. Mahoney

AI4CE

369

234

02 Oct 2018

Large batch size training of neural networks with adversarial training and second-order information

268

02 Oct 2018

Directional Analysis of Stochastic Gradient Descent via von Mises-Fisher Distributions in Deep learning

Cheolhyoung Lee

Dong Wang

Wanmo Kang

123

29 Sep 2018

The jamming transition as a paradigm to understand the loss landscape of deep neural networksPhysical Review E (PRE), 2018

393

152

25 Sep 2018

Identifying Generalization Properties in Neural Networks

Huan Wang

N. Keskar

Caiming Xiong

R. Socher

148

19 Sep 2018

Removing the Feature Correlation Effect of Multiplicative Noise

Zijun Zhang

Yining Zhang

Zongpeng Li

169

19 Sep 2018

Don't Use Large Mini-Batches, Use Local SGD

758

457

22 Aug 2018

Large Scale Language Modeling: Converging on 40GB of Text in Four Hours

139

03 Aug 2018

Generalization Error in Deep Learning

462

126

03 Aug 2018

A New Benchmark and Progress Toward Improved Weakly Supervised LearningBritish Machine Vision Conference (BMVC), 2018

Jason Ramapuram

Russ Webb

SSL

105

30 Jun 2018

Closing the Generalization Gap of Adaptive Gradient Methods in Training Deep Neural Networks

Quanquan Gu

359

208

18 Jun 2018

Full deep neural network training on a pruned weight budget

Maximilian Golub

G. Lemieux

Mieszko Lis

230

11 Jun 2018

The Effect of Network Width on the Performance of Large-batch Training

Lingjiao Chen

Hongyi Wang

Jinman Zhao

Dimitris Papailiopoulos

Paraschos Koutris

211

11 Jun 2018

Training Faster by Separating Modes of Variation in Batch-normalized Models

Mahdi M. Kalayeh

M. Shah

127

07 Jun 2018

Implicit regularization and solution uniqueness in over-parameterized matrix sensing

Kelly Geyer

Anastasios Kyrillidis

A. Kalev

222

06 Jun 2018

Stochastic Gradient Descent on Separable Data: Exact Convergence with a Fixed Learning Rate

283

108

05 Jun 2018