v1v2 (latest)

Train longer, generalize better: closing the generalization gap in large batch training of neural networks

24 May 2017

Papers citing "Train longer, generalize better: closing the generalization gap in large batch training of neural networks"

50 / 465 papers shown

A Diffusion Theory For Deep Learning Dynamics: Stochastic Gradient Descent Exponentially Favors Flat Minima

Zeke Xie

Issei Sato

Masashi Sugiyama

ODL

418

10 Feb 2020

Large Batch Training Does Not Need Warmup

157

04 Feb 2020

Variance Reduction with Sparse GradientsInternational Conference on Learning Representations (ICLR), 2020

Melih Elibol

Lihua Lei

Sai Li

131

27 Jan 2020

Understanding Why Neural Networks Generalize Well Through GSNR of ParametersInternational Conference on Learning Representations (ICLR), 2020

354

21 Jan 2020

Stochastic Weight Averaging in Parallel: Large-Batch Training that Generalizes WellInternational Conference on Learning Representations (ICLR), 2020

290

07 Jan 2020

On the Heavy-Tailed Theory of Stochastic Gradient Descent for Deep Neural Networks

323

29 Nov 2019

Auto-Precision Scaling for Distributed Deep LearningInformation Security Conference (IS), 2019

Ruobing Han

J. Demmel

Yang You

171

20 Nov 2019

Distributionally Robust Neural Networks for Group Shifts: On the Importance of Regularization for Worst-Case Generalization

Shiori Sagawa

Pang Wei Koh

Tatsunori B. Hashimoto

Abigail Z. Jacobs

OOD

290

1,451

20 Nov 2019

Information-Theoretic Local Minima Characterization and RegularizationInternational Conference on Machine Learning (ICML), 2019

Zhiwei Jia

Hao Su

243

19 Nov 2019

Generalization in Reinforcement Learning with Selective Noise Injection and Information BottleneckNeural Information Processing Systems (NeurIPS), 2019

Maximilian Igl

K. Ciosek

Yingzhen Li

Sebastian Tschiatschek

220

188

28 Oct 2019

A Simple Dynamic Learning Rate Tuning Algorithm For Automated Training of DNNs

Koyel Mukherjee

Alind Khare

Ashish Verma

149

25 Oct 2019

Gradient Sparification for Asynchronous Distributed Training

Zijie Yan

FedML

24 Oct 2019

Improved Generalization Bounds of Group Invariant / Equivariant Deep Networks via Quotient Feature SpacesConference on Uncertainty in Artificial Intelligence (UAI), 2019

214

15 Oct 2019

On Empirical Comparisons of Optimizers for Deep Learning

Dami Choi

Christopher J. Shallue

459

289

11 Oct 2019

SAFA: a Semi-Asynchronous Protocol for Fast Federated Learning with Low OverheadIEEE transactions on computers (IEEE Trans. Comput.), 2019

783

387

03 Oct 2019

How noise affects the Hessian spectrum in overparameterized neural networks

Ming-Bo Wei

D. Schwab

259

01 Oct 2019

At Stability's Edge: How to Adjust Hyperparameters to Preserve Minima Selection in Asynchronous Training of Neural Networks?International Conference on Learning Representations (ICLR), 2019

193

26 Sep 2019

Mixout: Effective Regularization to Finetune Large-scale Pretrained Language ModelsInternational Conference on Learning Representations (ICLR), 2019

503

228

25 Sep 2019

Scalable Kernel Learning via the Discriminant InformationIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2019

Mert Al

Zejiang Hou

S. Kung

143

23 Sep 2019

TabNet: Attentive Interpretable Tabular LearningAAAI Conference on Artificial Intelligence (AAAI), 2019

Sercan O. Arik

Tomas Pfister

LMTD

819

1,859

20 Aug 2019

Towards Better Generalization: BP-SVRG in Training Deep Neural Networks

108

18 Aug 2019

Mix & Match: training convnets with mixed image sizes for improved accuracy, speed and scale resiliency

210

12 Aug 2019

Optimizing Multi-GPU Parallelization Strategies for Deep Learning TrainingIEEE Micro (IEEE Micro), 2019

271

30 Jul 2019

Bias of Homotopic Gradient Descent for the Hinge LossApplied Mathematics and Optimization (AMO), 2019

Denali Molitor

Deanna Needell

Rachel A. Ward

121

26 Jul 2019

Learning Neural Networks with Adaptive RegularizationNeural Information Processing Systems (NeurIPS), 2019

108

14 Jul 2019

Faster Neural Network Training with Data Echoing

Dami Choi

Alexandre Passos

Christopher J. Shallue

George E. Dahl

350

12 Jul 2019

Towards Explaining the Regularization Effect of Initial Large Learning Rate in Training Neural NetworksNeural Information Processing Systems (NeurIPS), 2019

Yuanzhi Li

Colin Wei

Tengyu Ma

312

328

10 Jul 2019

Which Algorithmic Choices Matter at Which Batch Sizes? Insights From a Noisy Quadratic ModelNeural Information Processing Systems (NeurIPS), 2019

Christopher J. Shallue

Roger C. Grosse

418

176

09 Jul 2019

Stochastic Gradient and Langevin Processes

275

07 Jul 2019

Time-to-Event Prediction with Neural Networks and Cox RegressionJournal of machine learning research (JMLR), 2019

Håvard Kvamme

Ørnulf Borgan

Ida Scheel

563

404

01 Jul 2019

On the Noisy Gradient Descent that Generalizes as SGD

Haoyi Xiong

221

18 Jun 2019

Generalization Guarantees for Neural Networks via Harnessing the Low-rank Structure of the Jacobian

239

100

12 Jun 2019

Toward Interpretable Music Tagging with Self-Attention

Minz Won

Sanghyuk Chun

Xavier Serra

ViT

168

12 Jun 2019

The Implicit Bias of AdaGrad on Separable DataNeural Information Processing Systems (NeurIPS), 2019

Qian Qian

Xiaoyuan Qian

132

09 Jun 2019

Four Things Everyone Should Know to Improve Batch NormalizationInternational Conference on Learning Representations (ICLR), 2019

Cecilia Summers

M. Dinneen

202

09 Jun 2019

Inductive Bias of Gradient Descent based Adversarial Training on Separable Data

269

07 Jun 2019

Automated Machine Learning: State-of-The-Art and Open Challenges

Radwa El Shawi

Mohamed Maher

Sherif Sakr

187

189

05 Jun 2019

Implicit Regularization in Deep Matrix FactorizationNeural Information Processing Systems (NeurIPS), 2019

396

562

31 May 2019

Time Matters in Regularizing Deep Networks: Weight Decay and Data Augmentation Affect Early Learning Dynamics, Matter Little Near ConvergenceNeural Information Processing Systems (NeurIPS), 2019

Aditya Golatkar

Alessandro Achille

Stefano Soatto

147

105

30 May 2019

Lexicographic and Depth-Sensitive Margins in Homogeneous and Non-Homogeneous Deep ModelsInternational Conference on Machine Learning (ICML), 2019

195

17 May 2019

Scaling Distributed Training of Flood-Filling Networks on HPC Infrastructure for Brain MappingDynamic Languages Symposium (DLS), 2019

...

334

13 May 2019

Data-dependent Sample Complexity of Deep Neural Networks via Lipschitz AugmentationNeural Information Processing Systems (NeurIPS), 2019

Colin Wei

Tengyu Ma

382

122

09 May 2019

Batch Normalization is a Cause of Adversarial Vulnerability

239

06 May 2019

Dynamic Mini-batch SGD for Elastic Distributed Training: Learning in the Limbo of Resources

Sheng Zha

171

26 Apr 2019

Low-Memory Neural Network Training: A Technical Report

N. Sohoni

Christopher R. Aberger

Megan Leszczynski

Jian Zhang

Christopher Ré

254

110

24 Apr 2019

Large Batch Optimization for Deep Learning: Training BERT in 76 minutes

Yang You

Jing Li

Sashank J. Reddi

Jonathan Hseu

Sanjiv Kumar

Srinadh Bhojanapalli

887

1,113

01 Apr 2019

On the Stability and Generalization of Learning with Kernel Activation Functions

138

28 Mar 2019

TATi-Thermodynamic Analytics ToolkIt: TensorFlow-based software for posterior sampling in machine learning applications

Frederik Heber

Zofia Trstanova

Benedict Leimkuhler

173

20 Mar 2019

Inefficiency of K-FAC for Large Batch Size Training

214

14 Mar 2019

Communication-efficient distributed SGD with Sketching

269

220

12 Mar 2019