ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1609.04836
  4. Cited By
On Large-Batch Training for Deep Learning: Generalization Gap and Sharp
  Minima
v1v2 (latest)

On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima

15 September 2016
N. Keskar
Dheevatsa Mudigere
J. Nocedal
M. Smelyanskiy
P. T. P. Tang
    ODL
ArXiv (abs)PDFHTML

Papers citing "On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima"

50 / 1,554 papers shown
Title
The Benefits of Implicit Regularization from SGD in Least Squares
  Problems
The Benefits of Implicit Regularization from SGD in Least Squares Problems
Difan Zou
Jingfeng Wu
Vladimir Braverman
Quanquan Gu
Dean Phillips Foster
Sham Kakade
62
31
0
10 Aug 2021
Online Evolutionary Batch Size Orchestration for Scheduling Deep
  Learning Workloads in GPU Clusters
Online Evolutionary Batch Size Orchestration for Scheduling Deep Learning Workloads in GPU Clusters
Chen Sun
Shenggui Li
Jinyue Wang
Jun Yu
114
48
0
08 Aug 2021
Convergence of gradient descent for learning linear neural networks
Convergence of gradient descent for learning linear neural networks
Gabin Maxime Nguegnang
Holger Rauhut
Ulrich Terstiege
MLT
61
18
0
04 Aug 2021
Batch Normalization Preconditioning for Neural Network Training
Batch Normalization Preconditioning for Neural Network Training
Susanna Lange
Kyle E. Helfrich
Qiang Ye
64
9
0
02 Aug 2021
Finding Discriminative Filters for Specific Degradations in Blind
  Super-Resolution
Finding Discriminative Filters for Specific Degradations in Blind Super-Resolution
Liangbin Xie
Xintao Wang
Chao Dong
Zhongang Qi
Ying Shan
57
39
0
02 Aug 2021
Taxonomizing local versus global structure in neural network loss
  landscapes
Taxonomizing local versus global structure in neural network loss landscapes
Yaoqing Yang
Liam Hodgkinson
Ryan Theisen
Joe Zou
Joseph E. Gonzalez
Kannan Ramchandran
Michael W. Mahoney
111
37
0
23 Jul 2021
Analytic Study of Families of Spurious Minima in Two-Layer ReLU Neural
  Networks: A Tale of Symmetry II
Analytic Study of Families of Spurious Minima in Two-Layer ReLU Neural Networks: A Tale of Symmetry II
Yossi Arjevani
M. Field
70
19
0
21 Jul 2021
The Limiting Dynamics of SGD: Modified Loss, Phase Space Oscillations,
  and Anomalous Diffusion
The Limiting Dynamics of SGD: Modified Loss, Phase Space Oscillations, and Anomalous Diffusion
D. Kunin
Javier Sagastuy-Breña
Lauren Gillespie
Eshed Margalit
Hidenori Tanaka
Surya Ganguli
Daniel L. K. Yamins
93
20
0
19 Jul 2021
Rethinking Graph Auto-Encoder Models for Attributed Graph Clustering
Rethinking Graph Auto-Encoder Models for Attributed Graph Clustering
Nairouz Mrabah
Mohamed Bouguessa
M. Touati
Riadh Ksantini
99
64
0
19 Jul 2021
Point-Cloud Deep Learning of Porous Media for Permeability Prediction
Point-Cloud Deep Learning of Porous Media for Permeability Prediction
Ali Kashefi
T. Mukerji
3DPCAI4CE
63
35
0
18 Jul 2021
Globally Convergent Multilevel Training of Deep Residual Networks
Globally Convergent Multilevel Training of Deep Residual Networks
Alena Kopanicáková
Rolf Krause
109
15
0
15 Jul 2021
AlterSGD: Finding Flat Minima for Continual Learning by Alternative
  Training
AlterSGD: Finding Flat Minima for Continual Learning by Alternative Training
Zhongzhan Huang
Ming Liang
Senwei Liang
Wei He
CLLODL
71
6
0
13 Jul 2021
SGD: The Role of Implicit Regularization, Batch-size and Multiple-epochs
SGD: The Role of Implicit Regularization, Batch-size and Multiple-epochs
Satyen Kale
Ayush Sekhari
Karthik Sridharan
259
29
0
11 Jul 2021
The Bayesian Learning Rule
The Bayesian Learning Rule
Mohammad Emtiyaz Khan
Håvard Rue
BDL
159
83
0
09 Jul 2021
Activated Gradients for Deep Neural Networks
Activated Gradients for Deep Neural Networks
Mei Liu
Liangming Chen
Xiaohao Du
Long Jin
Mingsheng Shang
ODLAI4CE
72
144
0
09 Jul 2021
Simpler, Faster, Stronger: Breaking The log-K Curse On Contrastive
  Learners With FlatNCE
Simpler, Faster, Stronger: Breaking The log-K Curse On Contrastive Learners With FlatNCE
Junya Chen
Zhe Gan
Xuan Li
Qing Guo
Liqun Chen
...
Belinda Zeng
Wenlian Lu
Fan Li
Lawrence Carin
Chenyang Tao
96
28
0
02 Jul 2021
Revisiting Knowledge Distillation: An Inheritance and Exploration
  Framework
Revisiting Knowledge Distillation: An Inheritance and Exploration Framework
Zhen Huang
Xu Shen
Jun Xing
Tongliang Liu
Xinmei Tian
Houqiang Li
Bing Deng
Jianqiang Huang
Xiansheng Hua
61
28
0
01 Jul 2021
Analytic Insights into Structure and Rank of Neural Network Hessian Maps
Analytic Insights into Structure and Rank of Neural Network Hessian Maps
Sidak Pal Singh
Gregor Bachmann
Thomas Hofmann
FAtt
101
37
0
30 Jun 2021
What can linear interpolation of neural network loss landscapes tell us?
What can linear interpolation of neural network loss landscapes tell us?
Tiffany J. Vlaar
Jonathan Frankle
MoMe
72
28
0
30 Jun 2021
Never Go Full Batch (in Stochastic Convex Optimization)
Never Go Full Batch (in Stochastic Convex Optimization)
I Zaghloul Amir
Y. Carmon
Tomer Koren
Roi Livni
78
14
0
29 Jun 2021
Implicit Gradient Alignment in Distributed and Federated Learning
Implicit Gradient Alignment in Distributed and Federated Learning
Yatin Dandi
Luis Barba
Martin Jaggi
FedML
131
35
0
25 Jun 2021
HyperNP: Interactive Visual Exploration of Multidimensional Projection
  Hyperparameters
HyperNP: Interactive Visual Exploration of Multidimensional Projection Hyperparameters
G. Appleby
M. Espadoto
Rui Chen
Sam Goree
A. Telea
Erik W. Anderson
Remco Chang
41
11
0
25 Jun 2021
Sparse Flows: Pruning Continuous-depth Models
Sparse Flows: Pruning Continuous-depth Models
Lucas Liebenwein
Ramin Hasani
Alexander Amini
Daniela Rus
116
17
0
24 Jun 2021
Minimum sharpness: Scale-invariant parameter-robustness of neural
  networks
Minimum sharpness: Scale-invariant parameter-robustness of neural networks
Hikaru Ibayashi
Takuo Hamaguchi
Masaaki Imaizumi
64
5
0
23 Jun 2021
Dangers of Bayesian Model Averaging under Covariate Shift
Dangers of Bayesian Model Averaging under Covariate Shift
Pavel Izmailov
Patrick K. Nicholson
Sanae Lotfi
A. Wilson
OODUQCVBDL
151
46
0
22 Jun 2021
Rethinking Adam: A Twofold Exponential Moving Average Approach
Rethinking Adam: A Twofold Exponential Moving Average Approach
Yizhou Wang
Yue Kang
Can Qin
Huan Wang
Yi Xu
Yulun Zhang
Y. Fu
ODL
58
7
0
22 Jun 2021
Open-set Label Noise Can Improve Robustness Against Inherent Label Noise
Open-set Label Noise Can Improve Robustness Against Inherent Label Noise
Hongxin Wei
Lue Tao
Renchunzi Xie
Bo An
NoLa
78
86
0
21 Jun 2021
Better Training using Weight-Constrained Stochastic Dynamics
Better Training using Weight-Constrained Stochastic Dynamics
Benedict Leimkuhler
Tiffany J. Vlaar
Timothée Pouchon
Amos Storkey
43
9
0
20 Jun 2021
Practical Assessment of Generalization Performance Robustness for Deep
  Networks via Contrastive Examples
Practical Assessment of Generalization Performance Robustness for Deep Networks via Contrastive Examples
Xuanyu Wu
Xuhong Li
Haoyi Xiong
Xiao Zhang
Siyu Huang
Dejing Dou
23
1
0
20 Jun 2021
Deep Learning Through the Lens of Example Difficulty
Deep Learning Through the Lens of Example Difficulty
R. Baldock
Hartmut Maennel
Behnam Neyshabur
91
161
0
17 Jun 2021
Implicit Bias of SGD for Diagonal Linear Networks: a Provable Benefit of
  Stochasticity
Implicit Bias of SGD for Diagonal Linear Networks: a Provable Benefit of Stochasticity
Scott Pesme
Loucas Pillaud-Vivien
Nicolas Flammarion
80
108
0
17 Jun 2021
Robust Training in High Dimensions via Block Coordinate Geometric Median
  Descent
Robust Training in High Dimensions via Block Coordinate Geometric Median Descent
Anish Acharya
Abolfazl Hashemi
Prateek Jain
Sujay Sanghavi
Inderjit S. Dhillon
Ufuk Topcu
62
33
0
16 Jun 2021
Gradient Forward-Propagation for Large-Scale Temporal Video Modelling
Gradient Forward-Propagation for Large-Scale Temporal Video Modelling
Mateusz Malinowski
Dimitrios Vytiniotis
G. Swirszcz
Viorica Patraucean
João Carreira
65
8
0
15 Jun 2021
Economic Nowcasting with Long Short-Term Memory Artificial Neural
  Networks (LSTM)
Economic Nowcasting with Long Short-Term Memory Artificial Neural Networks (LSTM)
D. Hopp
AI4TS
38
33
0
15 Jun 2021
On Large-Cohort Training for Federated Learning
On Large-Cohort Training for Federated Learning
Zachary B. Charles
Zachary Garrett
Zhouyuan Huo
Sergei Shmulyian
Virginia Smith
FedML
77
114
0
15 Jun 2021
NG+ : A Multi-Step Matrix-Product Natural Gradient Method for Deep
  Learning
NG+ : A Multi-Step Matrix-Product Natural Gradient Method for Deep Learning
Minghan Yang
Dong Xu
Qiwen Cui
Zaiwen Wen
Pengxiang Xu
48
4
0
14 Jun 2021
Label Noise SGD Provably Prefers Flat Global Minimizers
Label Noise SGD Provably Prefers Flat Global Minimizers
Alexandru Damian
Tengyu Ma
Jason D. Lee
NoLa
142
120
0
11 Jun 2021
The dilemma of quantum neural networks
The dilemma of quantum neural networks
Yan Qian
Xinbiao Wang
Yuxuan Du
Xingyao Wu
Dacheng Tao
59
31
0
09 Jun 2021
What training reveals about neural network complexity
What training reveals about neural network complexity
Andreas Loukas
Marinos Poiitis
Stefanie Jegelka
60
11
0
08 Jun 2021
Correcting Momentum in Temporal Difference Learning
Correcting Momentum in Temporal Difference Learning
Emmanuel Bengio
Joelle Pineau
Doina Precup
64
10
0
07 Jun 2021
Regularization in ResNet with Stochastic Depth
Regularization in ResNet with Stochastic Depth
Soufiane Hayou
Fadhel Ayed
60
10
0
06 Jun 2021
RDA: Robust Domain Adaptation via Fourier Adversarial Attacking
RDA: Robust Domain Adaptation via Fourier Adversarial Attacking
Jiaxing Huang
Dayan Guan
Aoran Xiao
Shijian Lu
AAML
113
77
0
05 Jun 2021
Solving hybrid machine learning tasks by traversing weight space
  geodesics
Solving hybrid machine learning tasks by traversing weight space geodesics
G. Raghavan
Matt Thomson
34
0
0
05 Jun 2021
Stochastic gradient descent with noise of machine learning type. Part
  II: Continuous time analysis
Stochastic gradient descent with noise of machine learning type. Part II: Continuous time analysis
Stephan Wojtowytsch
88
34
0
04 Jun 2021
Improving Neural Network Robustness via Persistency of Excitation
Improving Neural Network Robustness via Persistency of Excitation
Kaustubh Sridhar
O. Sokolsky
Insup Lee
James Weimer
AAML
81
20
0
03 Jun 2021
Optimization Variance: Exploring Generalization Properties of DNNs
Optimization Variance: Exploring Generalization Properties of DNNs
Xiao Zhang
Dongrui Wu
Haoyi Xiong
Bo Dai
46
4
0
03 Jun 2021
When Vision Transformers Outperform ResNets without Pre-training or
  Strong Data Augmentations
When Vision Transformers Outperform ResNets without Pre-training or Strong Data Augmentations
Xiangning Chen
Cho-Jui Hsieh
Boqing Gong
ViT
108
330
0
03 Jun 2021
Post-mortem on a deep learning contest: a Simpson's paradox and the
  complementary roles of scale metrics versus shape metrics
Post-mortem on a deep learning contest: a Simpson's paradox and the complementary roles of scale metrics versus shape metrics
Charles H. Martin
Michael W. Mahoney
75
20
0
01 Jun 2021
Concurrent Adversarial Learning for Large-Batch Training
Concurrent Adversarial Learning for Large-Batch Training
Yong Liu
Xiangning Chen
Minhao Cheng
Cho-Jui Hsieh
Yang You
ODL
85
13
0
01 Jun 2021
A study on the plasticity of neural networks
A study on the plasticity of neural networks
Tudor Berariu
Wojciech M. Czarnecki
Soham De
J. Bornschein
Samuel L. Smith
Razvan Pascanu
Claudia Clopath
CLLAI4CE
79
32
0
31 May 2021
Previous
123...161718...303132
Next