v1v2 (latest)

Train longer, generalize better: closing the generalization gap in large batch training of neural networks

24 May 2017

Papers citing "Train longer, generalize better: closing the generalization gap in large batch training of neural networks"

50 / 465 papers shown

Understanding How Over-Parametrization Leads to Acceleration: A case of learning a single teacher neuronAsian Conference on Machine Learning (ACML), 2020

Jun-Kun Wang

Jacob D. Abernethy

265

04 Oct 2020

Quickly Finding a Benign Region via Heavy Ball Momentum in Non-Convex Optimization

Jun-Kun Wang

Jacob D. Abernethy

293

04 Oct 2020

Improved generalization by noise enhancement

Takashi Mori

Masahito Ueda

167

28 Sep 2020

Normalization Techniques in Training DNNs: Methodology, Analysis and ApplicationIEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2020

377

384

27 Sep 2020

Anomalous diffusion dynamics of learning in deep neural networksNeural Networks (NN), 2020

Guozhang Chen

Chengqing Qu

P. Gong

279

22 Sep 2020

Unsupervised Domain Adaptation by Uncertain Feature AlignmentBritish Machine Vision Conference (BMVC), 2020

Tobias Ringwald

Rainer Stiefelhagen

155

14 Sep 2020

HPSGD: Hierarchical Parallel SGD With Stale Gradients Featuring

202

06 Sep 2020

S-SGD: Symmetrical Stochastic Gradient Descent with Weight Noise Injection for Reaching Flat Minima

145

05 Sep 2020

Binary Classification as a Phase Separation Process

Rafael Monteiro

05 Sep 2020

HydaLearn: Highly Dynamic Task Weighting for Multi-task Learning with Auxiliary Tasks

163

26 Aug 2020

Noise-induced degeneration in online learning

Yuzuru Sato

Daiji Tsutsui

A. Fujiwara

143

24 Aug 2020

Relevance of Rotationally Equivariant Convolutions for Predicting Molecular Properties

Benjamin Kurt Miller

Mario Geiger

Tess E. Smidt

Frank Noé

327

19 Aug 2020

BroadFace: Looking at Tens of Thousands of People at Once for Face Recognition

326

15 Aug 2020

TF-NAS: Rethinking Three Search Freedoms of Latency-Constrained Differentiable Neural Architecture SearchEuropean Conference on Computer Vision (ECCV), 2020

Yibo Hu

Xiang Wu

Ran He

182

12 Aug 2020

Why to "grow" and "harvest" deep learning models?

I. Kulikovskikh

Tarzan Legović

VLM

08 Aug 2020

Implicit Regularization via Neural Feature Alignment

Pascal Vincent

130

03 Aug 2020

Stochastic Normalized Gradient Descent with Momentum for Large-Batch TrainingScience China Information Sciences (Sci China Inf Sci), 2020

229

28 Jul 2020

A New Look at Ghost Normalization

Neofytos Dimitriou

Ognjen Arandjelovic

227

16 Jul 2020

Analyzing and Mitigating Data Stalls in DNN TrainingProceedings of the VLDB Endowment (PVLDB), 2020

224

120

14 Jul 2020

Adaptive Periodic Averaging: A Practical Approach to Reducing Communication in Distributed Learning

Peng Jiang

G. Agrawal

150

13 Jul 2020

AdaScale SGD: A User-Friendly Algorithm for Distributed TrainingInternational Conference on Machine Learning (ICML), 2020

168

09 Jul 2020

Guided Learning of Nonconvex Models through Successive Functional Gradient Optimization

Rie Johnson

Tong Zhang

30 Jun 2020

Is SGD a Bayesian sampler? Well, almost

Chris Mingard

Guillermo Valle Pérez

Joar Skalse

A. Louis

BDL

303

26 Jun 2020

On the Generalization Benefit of Noise in Stochastic Gradient Descent

217

116

26 Jun 2020

Smooth Adversarial Training

Cihang Xie

Mingxing Tan

222

160

25 Jun 2020

How do SGD hyperparameters in natural training affect adversarial robustness?

122

20 Jun 2020

Learning Rates as a Function of Batch Size: A Random Matrix Theory Approach to Neural Network Training

521

16 Jun 2020

PAC-Bayesian Generalization Bounds for MultiLayer Perceptrons

Xinjie Lan

Xin Guo

Kenneth Barner

195

16 Jun 2020

Shape Matters: Understanding the Implicit Bias of the Noise Covariance

615

109

15 Jun 2020

The Limit of the Batch Size

Yang You

283

15 Jun 2020

Optimization Theory for ReLU Neural Networks Trained with Normalization LayersInternational Conference on Machine Learning (ICML), 2020

Yonatan Dukler

Quanquan Gu

Guido Montúfar

206

11 Jun 2020

Extrapolation for Large-batch Training in Deep LearningInternational Conference on Machine Learning (ICML), 2020

259

10 Jun 2020

Scaling Distributed Training with Adaptive Summation

116

04 Jun 2020

Inherent Noise in Gradient Based Methods

Arushi Gupta

121

26 May 2020

Learning Rate Annealing Can Provably Help Generalization, Even for Convex Problems

Preetum Nakkiran

MLT

160

15 May 2020

2kenize: Tying Subword Sequences for Chinese Script Conversion

Pranav A

Isabelle Augenstein

193

07 May 2020

Dynamic backup workers for parallel machine learning

Chuan Xu

Giovanni Neglia

Nicola Sebastianelli

274

30 Apr 2020

The Impact of the Mini-batch Size on the Variance of Gradients in Stochastic Gradient Descent

Xin-Yao Qian

Diego Klabjan

ODL

147

27 Apr 2020

SIPA: A Simple Framework for Efficient Networks

121

24 Apr 2020

Predicting the outputs of finite deep neural networks trained with noisy gradientsPhysical Review E (PRE), 2020

438

02 Apr 2020

Stochastic Proximal Gradient Algorithm with Minibatches. Application to Large Scale Learning Models

A. Pătraşcu

C. Paduraru

Paul Irofti

122

30 Mar 2020

Understanding the Effects of Data Parallelism and Sparsity on Neural Network Training

Namhoon Lee

Thalaiyasingam Ajanthan

Juil Sock

Martin Jaggi

207

25 Mar 2020

Robust and On-the-fly Dataset Denoising for Image ClassificationEuropean Conference on Computer Vision (ECCV), 2020

186

24 Mar 2020

The Implicit Regularization of Stochastic Gradient Flow for Least SquaresInternational Conference on Machine Learning (ICML), 2020

Alnur Ali

Guang Cheng

Robert Tibshirani

177

17 Mar 2020

Communication-Efficient Distributed Deep Learning: A Comprehensive Survey

249

10 Mar 2020

AL2: Progressive Activation Loss for Learning General Representations in Classification Neural NetworksIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2020

Majed El Helou

Frederike Dumbgen

Sabine Süsstrunk

CLL AI4CE

134

07 Mar 2020

Automatic Perturbation Analysis for Scalable Certified Robustness and Beyond

313

28 Feb 2020

Batch Normalization Biases Residual Blocks Towards the Identity Function in Deep Networks

Soham De

Samuel L. Smith

ODL

248

24 Feb 2020

The Two Regimes of Deep Network Training

Guillaume Leclerc

Aleksander Madry

197

24 Feb 2020

Unique Properties of Flat Minima in Deep NetworksInternational Conference on Machine Learning (ICML), 2020

Rotem Mulayoff

T. Michaeli

ODL

105

11 Feb 2020