ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1609.04836
  4. Cited By
On Large-Batch Training for Deep Learning: Generalization Gap and Sharp
  Minima
v1v2 (latest)

On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima

15 September 2016
N. Keskar
Dheevatsa Mudigere
J. Nocedal
M. Smelyanskiy
P. T. P. Tang
    ODL
ArXiv (abs)PDFHTML

Papers citing "On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima"

50 / 1,554 papers shown
Title
Communication-efficient Decentralized Machine Learning over
  Heterogeneous Networks
Communication-efficient Decentralized Machine Learning over Heterogeneous Networks
Pan Zhou
Qian Lin
Dumitrel Loghin
Beng Chin Ooi
Yuncheng Wu
Hongfang Yu
80
37
0
12 Sep 2020
Achieving Adversarial Robustness via Sparsity
Achieving Adversarial Robustness via Sparsity
Shu-Fan Wang
Ningyi Liao
Liyao Xiang
Nanyang Ye
Quanshi Zhang
AAML
58
16
0
11 Sep 2020
Self-Adaptive Physics-Informed Neural Networks using a Soft Attention
  Mechanism
Self-Adaptive Physics-Informed Neural Networks using a Soft Attention Mechanism
L. McClenny
U. Braga-Neto
PINN
96
464
0
07 Sep 2020
S-SGD: Symmetrical Stochastic Gradient Descent with Weight Noise
  Injection for Reaching Flat Minima
S-SGD: Symmetrical Stochastic Gradient Descent with Weight Noise Injection for Reaching Flat Minima
Wonyong Sung
Iksoo Choi
Jinhwan Park
Seokhyun Choi
Sungho Shin
ODL
58
7
0
05 Sep 2020
Estimating the Brittleness of AI: Safety Integrity Levels and the Need
  for Testing Out-Of-Distribution Performance
Estimating the Brittleness of AI: Safety Integrity Levels and the Need for Testing Out-Of-Distribution Performance
A. Lohn
51
13
0
02 Sep 2020
Extreme Memorization via Scale of Initialization
Extreme Memorization via Scale of Initialization
Harsh Mehta
Ashok Cutkosky
Behnam Neyshabur
60
20
0
31 Aug 2020
Predicting Training Time Without Training
Predicting Training Time Without Training
Luca Zancato
Alessandro Achille
Avinash Ravichandran
Rahul Bhotika
Stefano Soatto
156
24
0
28 Aug 2020
Adversarially Robust Learning via Entropic Regularization
Adversarially Robust Learning via Entropic Regularization
Gauri Jagatap
Ameya Joshi
A. B. Chowdhury
S. Garg
Chinmay Hegde
OOD
123
11
0
27 Aug 2020
Pollux: Co-adaptive Cluster Scheduling for Goodput-Optimized Deep
  Learning
Pollux: Co-adaptive Cluster Scheduling for Goodput-Optimized Deep Learning
Aurick Qiao
Sang Keun Choe
Suhas Jayaram Subramanya
Willie Neiswanger
Qirong Ho
Hao Zhang
G. Ganger
Eric Xing
VLM
77
183
0
27 Aug 2020
Traces of Class/Cross-Class Structure Pervade Deep Learning Spectra
Traces of Class/Cross-Class Structure Pervade Deep Learning Spectra
Vardan Papyan
64
80
0
27 Aug 2020
What is being transferred in transfer learning?
What is being transferred in transfer learning?
Behnam Neyshabur
Hanie Sedghi
Chiyuan Zhang
141
530
0
26 Aug 2020
HydaLearn: Highly Dynamic Task Weighting for Multi-task Learning with
  Auxiliary Tasks
HydaLearn: Highly Dynamic Task Weighting for Multi-task Learning with Auxiliary Tasks
Sam Verboven
M. H. Chaudhary
Jeroen Berrevoets
Wouter Verbeke
52
7
0
26 Aug 2020
Noise-induced degeneration in online learning
Noise-induced degeneration in online learning
Yuzuru Sato
Daiji Tsutsui
A. Fujiwara
42
2
0
24 Aug 2020
XNAP: Making LSTM-based Next Activity Predictions Explainable by Using
  LRP
XNAP: Making LSTM-based Next Activity Predictions Explainable by Using LRP
Sven Weinzierl
Sandra Zilker
Jens Brunk
K. Revoredo
Martin Matzner
J. Becker
62
27
0
18 Aug 2020
Adversarial Concurrent Training: Optimizing Robustness and Accuracy
  Trade-off of Deep Neural Networks
Adversarial Concurrent Training: Optimizing Robustness and Accuracy Trade-off of Deep Neural Networks
Elahe Arani
F. Sarfraz
Bahram Zonooz
AAML
60
9
0
16 Aug 2020
Skyline: Interactive In-Editor Computational Performance Profiling for
  Deep Neural Network Training
Skyline: Interactive In-Editor Computational Performance Profiling for Deep Neural Network Training
Geoffrey X. Yu
Tovi Grossman
Gennady Pekhimenko
41
17
0
15 Aug 2020
BroadFace: Looking at Tens of Thousands of People at Once for Face
  Recognition
BroadFace: Looking at Tens of Thousands of People at Once for Face Recognition
Y. Kim
Wonpyo Park
Jongju Shin
CVBM
139
51
0
15 Aug 2020
Optimizing Information Loss Towards Robust Neural Networks
Optimizing Information Loss Towards Robust Neural Networks
Philip Sperl
Konstantin Böttinger
AAML
45
3
0
07 Aug 2020
Neural Complexity Measures
Neural Complexity Measures
Yoonho Lee
Juho Lee
Sung Ju Hwang
Eunho Yang
Seungjin Choi
85
9
0
07 Aug 2020
Communication-Efficient and Distributed Learning Over Wireless Networks:
  Principles and Applications
Communication-Efficient and Distributed Learning Over Wireless Networks: Principles and Applications
Jihong Park
S. Samarakoon
Anis Elgabli
Joongheon Kim
M. Bennis
Seong-Lyun Kim
Mérouane Debbah
102
164
0
06 Aug 2020
Wasserstein-based Projections with Applications to Inverse Problems
Wasserstein-based Projections with Applications to Inverse Problems
Howard Heaton
Samy Wu Fung
A. Lin
Stanley Osher
W. Yin
58
3
0
05 Aug 2020
Analytic Characterization of the Hessian in Shallow ReLU Models: A Tale
  of Symmetry
Analytic Characterization of the Hessian in Shallow ReLU Models: A Tale of Symmetry
Yossi Arjevani
M. Field
55
16
0
04 Aug 2020
MLR-SNet: Transferable LR Schedules for Heterogeneous Tasks
MLR-SNet: Transferable LR Schedules for Heterogeneous Tasks
Jun Shu
Yanwen Zhu
Qian Zhao
Zongben Xu
Deyu Meng
75
7
0
29 Jul 2020
Stochastic Normalized Gradient Descent with Momentum for Large-Batch
  Training
Stochastic Normalized Gradient Descent with Momentum for Large-Batch Training
Shen-Yi Zhao
Chang-Wei Shi
Yin-Peng Xie
Wu-Jun Li
ODL
80
10
0
28 Jul 2020
AutoClip: Adaptive Gradient Clipping for Source Separation Networks
AutoClip: Adaptive Gradient Clipping for Source Separation Networks
Prem Seetharaman
Gordon Wichern
Bryan Pardo
Jonathan Le Roux
67
34
0
25 Jul 2020
Neural networks with late-phase weights
Neural networks with late-phase weights
J. Oswald
Seijin Kobayashi
Alexander Meulemans
Christian Henning
Benjamin Grewe
João Sacramento
94
35
0
25 Jul 2020
The Case for Strong Scaling in Deep Learning: Training Large 3D CNNs
  with Hybrid Parallelism
The Case for Strong Scaling in Deep Learning: Training Large 3D CNNs with Hybrid Parallelism
Yosuke Oyama
N. Maruyama
Nikoli Dryden
Erin McCarthy
P. Harrington
J. Balewski
Satoshi Matsuoka
Peter Nugent
B. Van Essen
3DVAI4CE
71
37
0
25 Jul 2020
Linear discriminant initialization for feed-forward neural networks
Linear discriminant initialization for feed-forward neural networks
Marissa Masden
D. Sinha
FedML
40
3
0
24 Jul 2020
Deforming the Loss Surface
Liangming Chen
Long Jin
Xiujuan Du
Shuai Li
Mei Liu
ODL
21
0
0
24 Jul 2020
Randomized Automatic Differentiation
Randomized Automatic Differentiation
Deniz Oktay
N. McGreivy
Joshua Aduol
Alex Beatson
Ryan P. Adams
ODL
65
27
0
20 Jul 2020
On regularization of gradient descent, layer imbalance and flat minima
On regularization of gradient descent, layer imbalance and flat minima
Boris Ginsburg
6
2
0
18 Jul 2020
Understanding Implicit Regularization in Over-Parameterized Single Index
  Model
Understanding Implicit Regularization in Over-Parameterized Single Index Model
Jianqing Fan
Zhuoran Yang
Mengxin Yu
81
18
0
16 Jul 2020
Data-driven effective model shows a liquid-like deep learning
Data-driven effective model shows a liquid-like deep learning
Wenxuan Zou
Haiping Huang
58
2
0
16 Jul 2020
Explicit Regularisation in Gaussian Noise Injections
Explicit Regularisation in Gaussian Noise Injections
A. Camuto
M. Willetts
Umut Simsekli
Stephen J. Roberts
Chris Holmes
100
59
0
14 Jul 2020
Beyond Graph Neural Networks with Lifted Relational Neural Networks
Beyond Graph Neural Networks with Lifted Relational Neural Networks
Gustav Sourek
F. Železný
Ondrej Kuzelka
NAI
131
18
0
13 Jul 2020
Adaptive Periodic Averaging: A Practical Approach to Reducing
  Communication in Distributed Learning
Adaptive Periodic Averaging: A Practical Approach to Reducing Communication in Distributed Learning
Peng Jiang
G. Agrawal
54
5
0
13 Jul 2020
A Study of Gradient Variance in Deep Learning
A Study of Gradient Variance in Deep Learning
Fartash Faghri
David Duvenaud
David J. Fleet
Jimmy Ba
FedMLODL
59
27
0
09 Jul 2020
Distributed Training of Deep Learning Models: A Taxonomic Perspective
Distributed Training of Deep Learning Models: A Taxonomic Perspective
M. Langer
Zhen He
W. Rahayu
Yanbo Xue
70
78
0
08 Jul 2020
DS-Sync: Addressing Network Bottlenecks with Divide-and-Shuffle
  Synchronization for Distributed DNN Training
DS-Sync: Addressing Network Bottlenecks with Divide-and-Shuffle Synchronization for Distributed DNN Training
Weiyan Wang
Cengguang Zhang
Liu Yang
Kai Chen
Kun Tan
68
12
0
07 Jul 2020
Predicting Porosity, Permeability, and Tortuosity of Porous Media from
  Images by Deep Learning
Predicting Porosity, Permeability, and Tortuosity of Porous Media from Images by Deep Learning
K. Graczyk
M. Matyka
3DVAI4CE
95
117
0
06 Jul 2020
Accelerating Nonconvex Learning via Replica Exchange Langevin Diffusion
Accelerating Nonconvex Learning via Replica Exchange Langevin Diffusion
Yi Chen
Jinglin Chen
Jing-rong Dong
Jian-wei Peng
Zhaoran Wang
82
33
0
04 Jul 2020
Variance reduction for Riemannian non-convex optimization with batch
  size adaptation
Variance reduction for Riemannian non-convex optimization with batch size adaptation
Andi Han
Junbin Gao
85
5
0
03 Jul 2020
The Global Landscape of Neural Networks: An Overview
The Global Landscape of Neural Networks: An Overview
Ruoyu Sun
Dawei Li
Shiyu Liang
Tian Ding
R. Srikant
84
88
0
02 Jul 2020
Lipschitzness Is All You Need To Tame Off-policy Generative Adversarial
  Imitation Learning
Lipschitzness Is All You Need To Tame Off-policy Generative Adversarial Imitation Learning
Lionel Blondé
Pablo Strasser
Alexandros Kalousis
90
22
0
28 Jun 2020
Is SGD a Bayesian sampler? Well, almost
Is SGD a Bayesian sampler? Well, almost
Chris Mingard
Guillermo Valle Pérez
Joar Skalse
A. Louis
BDL
77
53
0
26 Jun 2020
On the Generalization Benefit of Noise in Stochastic Gradient Descent
On the Generalization Benefit of Noise in Stochastic Gradient Descent
Samuel L. Smith
Erich Elsen
Soham De
MLT
62
100
0
26 Jun 2020
Effective Elastic Scaling of Deep Learning Workloads
Effective Elastic Scaling of Deep Learning Workloads
Vaibhav Saxena
K.R. Jayaram
Saurav Basu
Yogish Sabharwal
Ashish Verma
49
9
0
24 Jun 2020
Dynamic of Stochastic Gradient Descent with State-Dependent Noise
Dynamic of Stochastic Gradient Descent with State-Dependent Noise
Qi Meng
Shiqi Gong
Wei Chen
Zhi-Ming Ma
Tie-Yan Liu
53
16
0
24 Jun 2020
Understanding Deep Architectures with Reasoning Layer
Understanding Deep Architectures with Reasoning Layer
Xinshi Chen
Yufei Zhang
C. Reisinger
Le Song
AI4CE
127
7
0
24 Jun 2020
Exploiting Contextual Information with Deep Neural Networks
Exploiting Contextual Information with Deep Neural Networks
Ismail Elezi
50
3
0
21 Jun 2020
Previous
123...202122...303132
Next