ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1609.04836
  4. Cited By
On Large-Batch Training for Deep Learning: Generalization Gap and Sharp
  Minima
v1v2 (latest)

On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima

15 September 2016
N. Keskar
Dheevatsa Mudigere
J. Nocedal
M. Smelyanskiy
P. T. P. Tang
    ODL
ArXiv (abs)PDFHTML

Papers citing "On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima"

50 / 1,554 papers shown
Title
Same Pre-training Loss, Better Downstream: Implicit Bias Matters for
  Language Models
Same Pre-training Loss, Better Downstream: Implicit Bias Matters for Language Models
Hong Liu
Sang Michael Xie
Zhiyuan Li
Tengyu Ma
AI4CE
126
55
0
25 Oct 2022
Deep Neural Networks as the Semi-classical Limit of Topological Quantum
  Neural Networks: The problem of generalisation
Deep Neural Networks as the Semi-classical Limit of Topological Quantum Neural Networks: The problem of generalisation
A. Marcianò
De-Wei Chen
Filippo Fabrocini
C. Fields
M. Lulli
Emanuele Zappala
GNN
29
5
0
25 Oct 2022
Sufficient Invariant Learning for Distribution Shift
Sufficient Invariant Learning for Distribution Shift
Taero Kim
Sungjun Lim
Kyungwoo Song
OOD
66
2
0
24 Oct 2022
K-SAM: Sharpness-Aware Minimization at the Speed of SGD
K-SAM: Sharpness-Aware Minimization at the Speed of SGD
Renkun Ni
Ping Yeh-Chiang
Jonas Geiping
Micah Goldblum
A. Wilson
Tom Goldstein
64
9
0
23 Oct 2022
A New Perspective for Understanding Generalization Gap of Deep Neural
  Networks Trained with Large Batch Sizes
A New Perspective for Understanding Generalization Gap of Deep Neural Networks Trained with Large Batch Sizes
O. Oyedotun
Konstantinos Papadopoulos
Djamila Aouada
AI4CE
73
12
0
21 Oct 2022
Large-batch Optimization for Dense Visual Predictions
Large-batch Optimization for Dense Visual Predictions
Zeyue Xue
Jianming Liang
Guanglu Song
Zhuofan Zong
Liang Chen
Yu Liu
Ping Luo
VLM
96
9
0
20 Oct 2022
Motion correction in MRI using deep learning and a novel hybrid loss
  function
Motion correction in MRI using deep learning and a novel hybrid loss function
Lei Zhang
Xiaoke Wang
Michael Rawson
R. Balan
E. Herskovits
E. Melhem
Linda Chang
Ze Wang
T. Ernst
MedIm
84
13
0
19 Oct 2022
Rethinking Sharpness-Aware Minimization as Variational Inference
Rethinking Sharpness-Aware Minimization as Variational Inference
Szilvia Ujváry
Zsigmond Telek
A. Kerekes
Anna Mészáros
Ferenc Huszár
63
8
0
19 Oct 2022
Vision Transformers provably learn spatial structure
Vision Transformers provably learn spatial structure
Samy Jelassi
Michael E. Sander
Yuan-Fang Li
ViTMLT
100
83
0
13 Oct 2022
SQuAT: Sharpness- and Quantization-Aware Training for BERT
SQuAT: Sharpness- and Quantization-Aware Training for BERT
Zheng Wang
Juncheng Billy Li
Shuhui Qu
Florian Metze
Emma Strubell
MQ
42
7
0
13 Oct 2022
GA-SAM: Gradient-Strength based Adaptive Sharpness-Aware Minimization
  for Improved Generalization
GA-SAM: Gradient-Strength based Adaptive Sharpness-Aware Minimization for Improved Generalization
Zhiyuan Zhang
Ruixuan Luo
Qi Su
Xueting Sun
105
13
0
13 Oct 2022
Wasserstein Barycenter-based Model Fusion and Linear Mode Connectivity
  of Neural Networks
Wasserstein Barycenter-based Model Fusion and Linear Mode Connectivity of Neural Networks
A. K. Akash
Sixu Li
Nicolas García Trillos
71
13
0
13 Oct 2022
Compute-Efficient Deep Learning: Algorithmic Trends and Opportunities
Compute-Efficient Deep Learning: Algorithmic Trends and Opportunities
Brian Bartoldson
B. Kailkhura
Davis W. Blalock
107
51
0
13 Oct 2022
On the Effectiveness of Lipschitz-Driven Rehearsal in Continual Learning
On the Effectiveness of Lipschitz-Driven Rehearsal in Continual Learning
Lorenzo Bonicelli
Matteo Boschini
Angelo Porrello
C. Spampinato
Simone Calderara
CLL
72
48
0
12 Oct 2022
Improving Sharpness-Aware Minimization with Fisher Mask for Better
  Generalization on Language Models
Improving Sharpness-Aware Minimization with Fisher Mask for Better Generalization on Language Models
Qihuang Zhong
Liang Ding
Li Shen
Peng Mi
Juhua Liu
Bo Du
Dacheng Tao
AAML
90
51
0
11 Oct 2022
Stable and Efficient Adversarial Training through Local Linearization
Stable and Efficient Adversarial Training through Local Linearization
Zhuorong Li
Daiwei Yu
AAML
32
0
0
11 Oct 2022
SGD with Large Step Sizes Learns Sparse Features
SGD with Large Step Sizes Learns Sparse Features
Maksym Andriushchenko
Aditya Varre
Loucas Pillaud-Vivien
Nicolas Flammarion
136
60
0
11 Oct 2022
Make Sharpness-Aware Minimization Stronger: A Sparsified Perturbation
  Approach
Make Sharpness-Aware Minimization Stronger: A Sparsified Perturbation Approach
Peng Mi
Li Shen
Tianhe Ren
Yiyi Zhou
Xiaoshuai Sun
Rongrong Ji
Dacheng Tao
AAML
116
71
0
11 Oct 2022
TAN Without a Burn: Scaling Laws of DP-SGD
TAN Without a Burn: Scaling Laws of DP-SGD
Tom Sander
Pierre Stock
Alexandre Sablayrolles
FedML
86
43
0
07 Oct 2022
Invariant Aggregator for Defending against Federated Backdoor Attacks
Invariant Aggregator for Defending against Federated Backdoor Attacks
Xiaoya Wang
Dimitrios Dimitriadis
Oluwasanmi Koyejo
Shruti Tople
FedML
89
1
0
04 Oct 2022
Goal Misgeneralization: Why Correct Specifications Aren't Enough For
  Correct Goals
Goal Misgeneralization: Why Correct Specifications Aren't Enough For Correct Goals
Rohin Shah
Vikrant Varma
Ramana Kumar
Mary Phuong
Victoria Krakovna
J. Uesato
Zachary Kenton
92
72
0
04 Oct 2022
MEDFAIR: Benchmarking Fairness for Medical Imaging
MEDFAIR: Benchmarking Fairness for Medical Imaging
Yongshuo Zong
Yongxin Yang
Timothy M. Hospedales
OOD
173
65
0
04 Oct 2022
TripleE: Easy Domain Generalization via Episodic Replay
TripleE: Easy Domain Generalization via Episodic Replay
Xuelong Li
Hongyu Ren
Huifeng Yao
Ziwei Liu
26
0
0
04 Oct 2022
The Dynamics of Sharpness-Aware Minimization: Bouncing Across Ravines
  and Drifting Towards Wide Minima
The Dynamics of Sharpness-Aware Minimization: Bouncing Across Ravines and Drifting Towards Wide Minima
Peter L. Bartlett
Philip M. Long
Olivier Bousquet
162
37
0
04 Oct 2022
Shockwave: Fair and Efficient Cluster Scheduling for Dynamic Adaptation
  in Machine Learning
Shockwave: Fair and Efficient Cluster Scheduling for Dynamic Adaptation in Machine Learning
Pengfei Zheng
Rui Pan
Tarannum Khan
Shivaram Venkataraman
Aditya Akella
88
22
0
30 Sep 2022
Self-Stabilization: The Implicit Bias of Gradient Descent at the Edge of
  Stability
Self-Stabilization: The Implicit Bias of Gradient Descent at the Edge of Stability
Alexandru Damian
Eshaan Nichani
Jason D. Lee
107
88
0
30 Sep 2022
Scale-invariant Bayesian Neural Networks with Connectivity Tangent
  Kernel
Scale-invariant Bayesian Neural Networks with Connectivity Tangent Kernel
Sungyub Kim
Si-hun Park
Kyungsu Kim
Eunho Yang
BDL
79
5
0
30 Sep 2022
Learning Gradient-based Mixup towards Flatter Minima for Domain
  Generalization
Learning Gradient-based Mixup towards Flatter Minima for Domain Generalization
Danni Peng
Sinno Jialin Pan
64
3
0
29 Sep 2022
Label driven Knowledge Distillation for Federated Learning with non-IID
  Data
Label driven Knowledge Distillation for Federated Learning with non-IID Data
Minh-Duong Nguyen
Quoc-Viet Pham
D. Hoang
Long Tran-Thanh
Diep N. Nguyen
Won Joo Hwang
69
2
0
29 Sep 2022
Exploring the Relationship between Architecture and Adversarially Robust
  Generalization
Exploring the Relationship between Architecture and Adversarially Robust Generalization
Aishan Liu
Shiyu Tang
Siyuan Liang
Ruihao Gong
Boxi Wu
Xianglong Liu
Dacheng Tao
AAML
93
19
0
28 Sep 2022
A micromechanics-based recurrent neural networks model for
  path-dependent cyclic deformation of short fiber composites
A micromechanics-based recurrent neural networks model for path-dependent cyclic deformation of short fiber composites
J. Friemann
B. Dashtbozorg
Mikael Fagerström
S. Mirkhalaf
AI4CE
73
19
0
27 Sep 2022
Why neural networks find simple solutions: the many regularizers of
  geometric complexity
Why neural networks find simple solutions: the many regularizers of geometric complexity
Benoit Dherin
Michael Munn
M. Rosca
David Barrett
133
31
0
27 Sep 2022
Two-Tailed Averaging: Anytime, Adaptive, Once-in-a-While Optimal Weight
  Averaging for Better Generalization
Two-Tailed Averaging: Anytime, Adaptive, Once-in-a-While Optimal Weight Averaging for Better Generalization
Gábor Melis
MoMe
93
1
0
26 Sep 2022
A Closer Look at Learned Optimization: Stability, Robustness, and
  Inductive Biases
A Closer Look at Learned Optimization: Stability, Robustness, and Inductive Biases
James Harrison
Luke Metz
Jascha Narain Sohl-Dickstein
112
21
0
22 Sep 2022
Deep Double Descent via Smooth Interpolation
Deep Double Descent via Smooth Interpolation
Matteo Gamba
Erik Englesson
Mårten Björkman
Hossein Azizpour
169
11
0
21 Sep 2022
Learning Symbolic Model-Agnostic Loss Functions via Meta-Learning
Learning Symbolic Model-Agnostic Loss Functions via Meta-Learning
Christian Raymond
Qi Chen
Bing Xue
Mengjie Zhang
FedML
83
13
0
19 Sep 2022
Is Stochastic Gradient Descent Near Optimal?
Is Stochastic Gradient Descent Near Optimal?
Yifan Zhu
Hong Jun Jeon
Benjamin Van Roy
69
2
0
18 Sep 2022
Towards Bridging the Performance Gaps of Joint Energy-based Models
Towards Bridging the Performance Gaps of Joint Energy-based Models
Xiulong Yang
Qing Su
Shihao Ji
VLM
65
15
0
16 Sep 2022
Losing momentum in continuous-time stochastic optimisation
Losing momentum in continuous-time stochastic optimisation
Kexin Jin
J. Latz
Chenguang Liu
Alessandro Scagliotti
52
2
0
08 Sep 2022
Information Maximization for Extreme Pose Face Recognition
Information Maximization for Extreme Pose Face Recognition
Mohammad Saeed Ebrahimi Saadabadi
Sahar Rahimi Malakshan
Sobhan Soleymani
Moktari Mostofa
Nasser M. Nasrabadi
CVBM
59
5
0
07 Sep 2022
Generalisation under gradient descent via deterministic PAC-Bayes
Generalisation under gradient descent via deterministic PAC-Bayes
Eugenio Clerico
Tyler Farghly
George Deligiannidis
Benjamin Guedj
Arnaud Doucet
152
4
0
06 Sep 2022
Investigating the Impact of Model Misspecification in Neural
  Simulation-based Inference
Investigating the Impact of Model Misspecification in Neural Simulation-based Inference
Patrick W Cannon
Daniel Ward
Sebastian M. Schmon
78
36
0
05 Sep 2022
Super-model ecosystem: A domain-adaptation perspective
Super-model ecosystem: A domain-adaptation perspective
Fengxiang He
Dacheng Tao
DiffM
84
1
0
30 Aug 2022
Visualizing high-dimensional loss landscapes with Hessian directions
Visualizing high-dimensional loss landscapes with Hessian directions
Lucas Böttcher
Gregory R. Wheeler
79
14
0
28 Aug 2022
On the Implicit Bias in Deep-Learning Algorithms
On the Implicit Bias in Deep-Learning Algorithms
Gal Vardi
FedMLAI4CE
91
81
0
26 Aug 2022
FS-BAN: Born-Again Networks for Domain Generalization Few-Shot
  Classification
FS-BAN: Born-Again Networks for Domain Generalization Few-Shot Classification
Yunqing Zhao
Ngai-Man Cheung
BDL
65
13
0
23 Aug 2022
A Unified Analysis of Mixed Sample Data Augmentation: A Loss Function
  Perspective
A Unified Analysis of Mixed Sample Data Augmentation: A Loss Function Perspective
Chanwoo Park
Sangdoo Yun
Sanghyuk Chun
AAML
83
32
0
21 Aug 2022
Quantifying the Knowledge in a DNN to Explain Knowledge Distillation for
  Classification
Quantifying the Knowledge in a DNN to Explain Knowledge Distillation for Classification
Quanshi Zhang
Xu Cheng
Yilan Chen
Zhefan Rao
56
36
0
18 Aug 2022
Object Detection for Autonomous Dozers
Object Detection for Autonomous Dozers
Chunfang Liu
Burhaneddin Yaman
68
2
0
17 Aug 2022
On the generalization of learning algorithms that do not converge
On the generalization of learning algorithms that do not converge
N. Chandramoorthy
Andreas Loukas
Khashayar Gatmiry
Stefanie Jegelka
MLT
91
11
0
16 Aug 2022
Previous
123...111213...303132
Next