Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales

Terms and Conditions

Twitter GitHub LinkedIn Bluesky Youtube

© 2026 ResearchTrend.AI, All rights reserved.

Home
Papers
1609.04836
Cited By

On Large-Batch Training for Deep Learning: Generalization Gap and Sharp
Minima

v1v2 (latest)

On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima

15 September 2016

Dheevatsa Mudigere

ArXiv (abs)PDF HTML

Papers citing "On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima"

50 / 1,653 papers shown

How Does Batch Normalization Help Optimization?

How Does Batch Normalization Help Optimization?

Shibani Santurkar

Dimitris Tsipras

Aleksander Madry

547

1,670

0

29 May 2018

Distilling Knowledge for Search-based Structured Prediction

Distilling Knowledge for Search-based Structured Prediction

106

22

0

29 May 2018

Investigating Label Noise Sensitivity of Convolutional Neural Networks
for Fine Grained Audio Signal Labelling

Investigating Label Noise Sensitivity of Convolutional Neural Networks for Fine Grained Audio Signal Labelling

84

4

0

28 May 2018

A Double-Deep Spatio-Angular Learning Framework for Light Field based
Face Recognition

A Double-Deep Spatio-Angular Learning Framework for Light Field based Face Recognition

Alireza Sepas-Moghaddam

Kamal Nasrollahi

146

38

0

25 May 2018

Local SGD Converges Fast and Communicates Little

Local SGD Converges Fast and Communicates Little

Sebastian U. Stich

1.1K

1,196

0

24 May 2018

Input and Weight Space Smoothing for Semi-supervised Learning

Input and Weight Space Smoothing for Semi-supervised Learning

129

6

0

23 May 2018

Deep learning generalizes because the parameter-function map is biased
towards simple functions

Deep learning generalizes because the parameter-function map is biased towards simple functions

Guillermo Valle Pérez

Chico Q. Camargo

459

256

0

22 May 2018

Gradient Energy Matching for Distributed Asynchronous Gradient Descent

Gradient Energy Matching for Distributed Asynchronous Gradient Descent

128

5

0

22 May 2018

Stochastic modified equations for the asynchronous stochastic gradient
descent

Stochastic modified equations for the asynchronous stochastic gradient descent

173

78

0

21 May 2018

Never look back - A modified EnKF method and its application to the
training of neural networks without back propagation

Never look back - A modified EnKF method and its application to the training of neural networks without back propagation

245

33

0

21 May 2018

SmoothOut: Smoothing Out Sharp Minima to Improve Generalization in Deep
Learning

SmoothOut: Smoothing Out Sharp Minima to Improve Generalization in Deep Learning

Yiran Chen

241

54

0

21 May 2018

Parameter Hub: a Rack-Scale Parameter Server for Distributed Deep Neural
Network Training

Parameter Hub: a Rack-Scale Parameter Server for Distributed Deep Neural Network Training

Amar Phanishayee

Arvind Krishnamurthy

339

128

0

21 May 2018

DNN or k-NN: That is the Generalize vs. Memorize Question

DNN or k-NN: That is the Generalize vs. Memorize Question

Guillermo Sapiro

315

40

0

17 May 2018

Unifying Data, Model and Hybrid Parallelism in Deep Learning via Tensor
Tiling

Unifying Data, Model and Hybrid Parallelism in Deep Learning via Tensor Tiling

Chien-chin Huang

94

25

0

10 May 2018

On Visual Hallmarks of Robustness to Adversarial Malware

On Visual Hallmarks of Robustness to Adversarial Malware

Abdullah Al-Dujaili

Una-May O’Reilly

143

8

0

09 May 2018

SaaS: Speed as a Supervisor for Semi-supervised Learning

SaaS: Speed as a Supervisor for Semi-supervised Learning

Alhussein Fawzi

196

20

0

02 May 2018

SHADE: Information Based Regularization for Deep Learning

SHADE: Information Based Regularization for Deep Learning

289

12

0

29 Apr 2018

HG-means: A scalable hybrid genetic algorithm for minimum sum-of-squares
clustering

HG-means: A scalable hybrid genetic algorithm for minimum sum-of-squares clustering

122

45

0

25 Apr 2018

Path Planning in Support of Smart Mobility Applications using Generative
Adversarial Networks

Path Planning in Support of Smart Mobility Applications using Generative Adversarial Networks

Ala I. Al-Fuqaha

233

25

0

23 Apr 2018

Revisiting Small Batch Training for Deep Neural Networks

Revisiting Small Batch Training for Deep Neural Networks

Dominic Masters

Carlo Luschi

179

735

0

20 Apr 2018

Non-Vacuous Generalization Bounds at the ImageNet Scale: A PAC-Bayesian
Compression Approach

Non-Vacuous Generalization Bounds at the ImageNet Scale: A PAC-Bayesian Compression Approach

Morgane Austern

326

228

0

16 Apr 2018

DeepFM: An End-to-End Wide & Deep Learning Framework for CTR Prediction

DeepFM: An End-to-End Wide & Deep Learning Framework for CTR Prediction

Ruiming Tang

247

65

0

12 Apr 2018

Large scale distributed neural network training through online
distillation

Large scale distributed neural network training through online distillation

Gabriel Pereyra

Alexandre Passos

Róbert Ormándi

Geoffrey E. Hinton

681

441

0

09 Apr 2018

The Loss Surface of XOR Artificial Neural Networks

The Loss Surface of XOR Artificial Neural Networks

Edgar A. Bernal

323

19

0

06 Apr 2018

Training Tips for the Transformer Model

Training Tips for the Transformer Model

482

326

0

01 Apr 2018

Online Second Order Methods for Non-Convex Stochastic Optimizations

Online Second Order Methods for Non-Convex Stochastic Optimizations

111

4

0

26 Mar 2018

On the Local Minima of the Empirical Risk

On the Local Minima of the Empirical Risk

323

58

0

25 Mar 2018

Multiple Sclerosis Lesion Segmentation from Brain MRI via Fully
Convolutional Neural Networks

Multiple Sclerosis Lesion Segmentation from Brain MRI via Fully Convolutional Neural Networks

121

93

0

24 Mar 2018

A high-bias, low-variance introduction to Machine Learning for
physicists

A high-bias, low-variance introduction to Machine Learning for physicists

Marin Bukov

Charles K. Fisher

397

951

0

23 Mar 2018

Gradient Descent Quantizes ReLU Network Features

Gradient Descent Quantizes ReLU Network Features

Hartmut Maennel

Olivier Bousquet

148

88

0

22 Mar 2018

Learning Eligibility in Cancer Clinical Trials using Deep Neural
Networks

Learning Eligibility in Cancer Clinical Trials using Deep Neural Networks

153

27

0

22 Mar 2018

Assessing Shape Bias Property of Convolutional Neural Networks

Assessing Shape Bias Property of Convolutional Neural Networks

Hossein Hosseini

Mayoore S. Jaiswal

Radha Poovendran

145

39

0

21 Mar 2018

Comparing Dynamics: Deep Neural Networks versus Glassy Systems

Comparing Dynamics: Deep Neural Networks versus Glassy Systems

Gerard Ben Arous

343

124

0

19 Mar 2018

On the importance of single directions for generalization

On the importance of single directions for generalization

Neil C. Rabinowitz

456

348

0

19 Mar 2018

On the insufficiency of existing momentum schemes for Stochastic
Optimization

On the insufficiency of existing momentum schemes for Stochastic OptimizationInformation Theory and Applications Workshop (ITA), 2018

Praneeth Netrapalli

267

130

0

15 Mar 2018

Averaging Weights Leads to Wider Optima and Better Generalization

Averaging Weights Leads to Wider Optima and Better GeneralizationConference on Uncertainty in Artificial Intelligence (UAI), 2018

Dmitrii Podoprikhin

Dmitry Vetrov

649

1,898

0

14 Mar 2018

TicTac: Accelerating Distributed Deep Learning with Communication
Scheduling

TicTac: Accelerating Distributed Deep Learning with Communication Scheduling

Sayed Hadi Hashemi

Sangeetha Abdu Jyothi

170

206

0

08 Mar 2018

Essentially No Barriers in Neural Network Energy Landscape

Essentially No Barriers in Neural Network Energy LandscapeInternational Conference on Machine Learning (ICML), 2018

594

491

0

02 Mar 2018

The Anisotropic Noise in Stochastic Gradient Descent: Its Behavior of
Escaping from Sharp Minima and Regularization Effects

The Anisotropic Noise in Stochastic Gradient Descent: Its Behavior of Escaping from Sharp Minima and Regularization Effects

239

40

0

01 Mar 2018

Neural Inverse Rendering for General Reflectance Photometric Stereo

Neural Inverse Rendering for General Reflectance Photometric StereoInternational Conference on Machine Learning (ICML), 2018

Tatsunori Taniai

Takanori Maehara

268

112

0

28 Feb 2018

Semi-Supervised Learning Enabled by Multiscale Deep Neural Network
Inversion

Semi-Supervised Learning Enabled by Multiscale Deep Neural Network Inversion

Randall Balestriero

Richard Baraniuk

184

6

0

27 Feb 2018

Loss Surfaces, Mode Connectivity, and Fast Ensembling of DNNs

Loss Surfaces, Mode Connectivity, and Fast Ensembling of DNNsNeural Information Processing Systems (NeurIPS), 2018

Dmitrii Podoprikhin

Dmitry Vetrov

677

852

0

27 Feb 2018

Solving Inverse Computational Imaging Problems using Deep Pixel-level
Prior

Solving Inverse Computational Imaging Problems using Deep Pixel-level PriorIEEE Transactions on Computational Imaging (TCI), 2018

Anil Kumar Vadathya

185

24

0

27 Feb 2018

A Walk with SGD

A Walk with SGD

Christos Tsirigotis

349

133

0

24 Feb 2018

Sensitivity and Generalization in Neural Networks: an Empirical Study

Sensitivity and Generalization in Neural Networks: an Empirical Study

Daniel A. Abolafia

Jeffrey Pennington

Jascha Narain Sohl-Dickstein

413

482

0

23 Feb 2018

Characterizing Implicit Bias in Terms of Optimization Geometry

Characterizing Implicit Bias in Terms of Optimization Geometry

Suriya Gunasekar

548

436

0

22 Feb 2018

Hessian-based Analysis of Large Batch Training and Robustness to
Adversaries

Hessian-based Analysis of Large Batch Training and Robustness to Adversaries

Michael W. Mahoney

449

177

0

22 Feb 2018

The Secret Sharer: Evaluating and Testing Unintended Memorization in
Neural Networks

The Secret Sharer: Evaluating and Testing Unintended Memorization in Neural Networks

Nicholas Carlini

Ulfar Erlingsson

768

1,322

0

22 Feb 2018

Improved Techniques For Weakly-Supervised Object Localization

Improved Techniques For Weakly-Supervised Object Localization

275

8

0

22 Feb 2018

An Alternative View: When Does SGD Escape Local Minima?

An Alternative View: When Does SGD Escape Local Minima?

Robert D. Kleinberg

347

332

0

17 Feb 2018

1 2 3...30 31 32 33 34

Page 31 of 34

Pageof 34