ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1609.04836
  4. Cited By
On Large-Batch Training for Deep Learning: Generalization Gap and Sharp
  Minima
v1v2 (latest)

On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima

15 September 2016
N. Keskar
Dheevatsa Mudigere
J. Nocedal
M. Smelyanskiy
P. T. P. Tang
    ODL
ArXiv (abs)PDFHTML

Papers citing "On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima"

50 / 1,554 papers shown
Title
Catapults in SGD: spikes in the training loss and their impact on
  generalization through feature learning
Catapults in SGD: spikes in the training loss and their impact on generalization through feature learning
Libin Zhu
Chaoyue Liu
Adityanarayanan Radhakrishnan
M. Belkin
124
15
0
07 Jun 2023
Stochastic Collapse: How Gradient Noise Attracts SGD Dynamics Towards
  Simpler Subnetworks
Stochastic Collapse: How Gradient Noise Attracts SGD Dynamics Towards Simpler Subnetworks
F. Chen
D. Kunin
Atsushi Yamamura
Surya Ganguli
121
29
0
07 Jun 2023
Normalization Layers Are All That Sharpness-Aware Minimization Needs
Normalization Layers Are All That Sharpness-Aware Minimization Needs
Maximilian Mueller
Tiffany J. Vlaar
David Rolnick
Matthias Hein
87
24
0
07 Jun 2023
Optimal Transport Model Distributional Robustness
Optimal Transport Model Distributional Robustness
Van-Anh Nguyen
Trung Le
Anh Tuan Bui
Thanh-Toan Do
Dinh Q. Phung
OOD
77
4
0
07 Jun 2023
Decentralized SGD and Average-direction SAM are Asymptotically
  Equivalent
Decentralized SGD and Average-direction SAM are Asymptotically Equivalent
Tongtian Zhu
Fengxiang He
Kaixuan Chen
Mingli Song
Dacheng Tao
152
15
0
05 Jun 2023
Enhance Diffusion to Improve Robust Generalization
Enhance Diffusion to Improve Robust Generalization
Jianhui Sun
Sanchit Sinha
Aidong Zhang
69
4
0
05 Jun 2023
ReContrast: Domain-Specific Anomaly Detection via Contrastive
  Reconstruction
ReContrast: Domain-Specific Anomaly Detection via Contrastive Reconstruction
Jia Guo
Shuai Lu
Lize Jia
Weihang Zhang
Huiqi Li
94
31
0
05 Jun 2023
When Decentralized Optimization Meets Federated Learning
When Decentralized Optimization Meets Federated Learning
Hongchang Gao
My T. Thai
Jie Wu
FedML
96
23
0
05 Jun 2023
A Mathematical Abstraction for Balancing the Trade-off Between
  Creativity and Reality in Large Language Models
A Mathematical Abstraction for Balancing the Trade-off Between Creativity and Reality in Large Language Models
Ritwik Sinha
Zhao Song
Dinesh Manocha
102
25
0
04 Jun 2023
Investigating Navigation Strategies in the Morris Water Maze through
  Deep Reinforcement Learning
Investigating Navigation Strategies in the Morris Water Maze through Deep Reinforcement Learning
A. Liu
Alla Borisyuk
48
7
0
01 Jun 2023
Understanding Augmentation-based Self-Supervised Representation Learning
  via RKHS Approximation and Regression
Understanding Augmentation-based Self-Supervised Representation Learning via RKHS Approximation and Regression
Runtian Zhai
Bing Liu
Andrej Risteski
Zico Kolter
Pradeep Ravikumar
SSL
117
10
0
01 Jun 2023
Combining Explicit and Implicit Regularization for Efficient Learning in
  Deep Networks
Combining Explicit and Implicit Regularization for Efficient Learning in Deep Networks
Dan Zhao
104
6
0
01 Jun 2023
Inconsistency, Instability, and Generalization Gap of Deep Neural
  Network Training
Inconsistency, Instability, and Generalization Gap of Deep Neural Network Training
Rie Johnson
Tong Zhang
43
6
0
31 May 2023
Adaptive Self-Distillation for Minimizing Client Drift in Heterogeneous
  Federated Learning
Adaptive Self-Distillation for Minimizing Client Drift in Heterogeneous Federated Learning
M.Yashwanth
Gaurav Kumar Nayak
Aryaveer Singh
Yogesh Singh
Anirban Chakraborty
FedML
98
1
0
31 May 2023
SANE: The phases of gradient descent through Sharpness Adjusted Number
  of Effective parameters
SANE: The phases of gradient descent through Sharpness Adjusted Number of Effective parameters
Lawrence Wang
Stephen J. Roberts
116
0
0
29 May 2023
The Implicit Regularization of Dynamical Stability in Stochastic
  Gradient Descent
The Implicit Regularization of Dynamical Stability in Stochastic Gradient Descent
Lei Wu
Weijie J. Su
MLT
93
23
0
27 May 2023
Ghost Noise for Regularizing Deep Neural Networks
Ghost Noise for Regularizing Deep Neural Networks
Atli Kosson
Dongyang Fan
Martin Jaggi
46
1
0
26 May 2023
Batch Model Consolidation: A Multi-Task Model Consolidation Framework
Batch Model Consolidation: A Multi-Task Model Consolidation Framework
Iordanis Fostiropoulos
Jiaye Zhu
Laurent Itti
MoMeCLL
54
3
0
25 May 2023
Sharpness-Aware Minimization Leads to Low-Rank Features
Sharpness-Aware Minimization Leads to Low-Rank Features
Maksym Andriushchenko
Dara Bahri
H. Mobahi
Nicolas Flammarion
AAML
124
25
0
25 May 2023
Implicit bias of SGD in $L_{2}$-regularized linear DNNs: One-way jumps
  from high to low rank
Implicit bias of SGD in L2L_{2}L2​-regularized linear DNNs: One-way jumps from high to low rank
Zihan Wang
Arthur Jacot
83
21
0
25 May 2023
SING: A Plug-and-Play DNN Learning Technique
SING: A Plug-and-Play DNN Learning Technique
Adrien Courtois
Damien Scieur
Jean-Michel Morel
Pablo Arias
Thomas Eboli
54
0
0
25 May 2023
Stochastic Modified Equations and Dynamics of Dropout Algorithm
Stochastic Modified Equations and Dynamics of Dropout Algorithm
Zhongwang Zhang
Yuqing Li
Yaoyu Zhang
Z. Xu
43
9
0
25 May 2023
Sharpness-Aware Minimization Revisited: Weighted Sharpness as a
  Regularization Term
Sharpness-Aware Minimization Revisited: Weighted Sharpness as a Regularization Term
Yun Yue
Jiadi Jiang
Zhiling Ye
Ni Gao
Yongchao Liu
Kecheng Zhang
MLAUODL
113
14
0
25 May 2023
How to escape sharp minima with random perturbations
How to escape sharp minima with random perturbations
Kwangjun Ahn
Ali Jadbabaie
S. Sra
ODL
113
8
0
25 May 2023
The Crucial Role of Normalization in Sharpness-Aware Minimization
The Crucial Role of Normalization in Sharpness-Aware Minimization
Yan Dai
Kwangjun Ahn
S. Sra
120
19
0
24 May 2023
Momentum Provably Improves Error Feedback!
Momentum Provably Improves Error Feedback!
Ilyas Fatkhullin
Alexander Tyurin
Peter Richtárik
116
23
0
24 May 2023
On progressive sharpening, flat minima and generalisation
On progressive sharpening, flat minima and generalisation
L. MacDonald
Jack Valmadre
Simon Lucey
80
4
0
24 May 2023
Sharpness-Aware Data Poisoning Attack
Sharpness-Aware Data Poisoning Attack
Pengfei He
Han Xu
Jie Ren
Yingqian Cui
Hui Liu
Charu C. Aggarwal
Jiliang Tang
AAML
151
8
0
24 May 2023
Transferring Learning Trajectories of Neural Networks
Transferring Learning Trajectories of Neural Networks
Daiki Chijiwa
57
3
0
23 May 2023
On the Optimal Batch Size for Byzantine-Robust Distributed Learning
On the Optimal Batch Size for Byzantine-Robust Distributed Learning
Yi-Rui Yang
Chang-Wei Shi
Wu-Jun Li
FedMLAAML
92
0
0
23 May 2023
Improving Convergence and Generalization Using Parameter Symmetries
Improving Convergence and Generalization Using Parameter Symmetries
Bo Zhao
Robert Mansel Gower
Robin Walters
Rose Yu
MoMe
127
16
0
22 May 2023
Gradient Descent Monotonically Decreases the Sharpness of Gradient Flow
  Solutions in Scalar Networks and Beyond
Gradient Descent Monotonically Decreases the Sharpness of Gradient Flow Solutions in Scalar Networks and Beyond
Itai Kreisler
Mor Shpigel Nacson
Daniel Soudry
Y. Carmon
73
15
0
22 May 2023
Evolutionary Algorithms in the Light of SGD: Limit Equivalence, Minima
  Flatness, and Transfer Learning
Evolutionary Algorithms in the Light of SGD: Limit Equivalence, Minima Flatness, and Transfer Learning
Andrei Kucharavy
R. Guerraoui
Ljiljana Dolamic
104
1
0
20 May 2023
Loss Spike in Training Neural Networks
Loss Spike in Training Neural Networks
Zhongwang Zhang
Z. Xu
72
7
0
20 May 2023
Flatness-Aware Prompt Selection Improves Accuracy and Sample Efficiency
Flatness-Aware Prompt Selection Improves Accuracy and Sample Efficiency
Lingfeng Shen
Weiting Tan
Boyuan Zheng
Daniel Khashabi
VLM
127
6
0
18 May 2023
Generalization Bounds for Neural Belief Propagation Decoders
Generalization Bounds for Neural Belief Propagation Decoders
Sudarshan Adiga
Xin Xiao
Ravi Tandon
Bane V. Vasic
Tamal Bose
BDLAI4CE
64
5
0
17 May 2023
Sharpness & Shift-Aware Self-Supervised Learning
Sharpness & Shift-Aware Self-Supervised Learning
Ngoc N. Tran
S. Duong
Hoang Phan
Tung Pham
Dinh Q. Phung
Trung Le
SSL
71
1
0
17 May 2023
SUG: Single-dataset Unified Generalization for 3D Point Cloud
  Classification
SUG: Single-dataset Unified Generalization for 3D Point Cloud Classification
Siyuan Huang
Bo Zhang
Botian Shi
Penglei Gao
Yikang Li
Hongsheng Li
3DPC3DGS
72
14
0
16 May 2023
The Hessian perspective into the Nature of Convolutional Neural Networks
The Hessian perspective into the Nature of Convolutional Neural Networks
Sidak Pal Singh
Thomas Hofmann
Bernhard Schölkopf
96
11
0
16 May 2023
GeNAS: Neural Architecture Search with Better Generalization
GeNAS: Neural Architecture Search with Better Generalization
Joonhyun Jeong
Joonsang Yu
Geondo Park
Dongyoon Han
Y. Yoo
78
4
0
15 May 2023
Quantization Aware Attack: Enhancing Transferable Adversarial Attacks by
  Model Quantization
Quantization Aware Attack: Enhancing Transferable Adversarial Attacks by Model Quantization
Yulong Yang
Chenhao Lin
Qian Li
Zhengyu Zhao
Haoran Fan
Dawei Zhou
Nannan Wang
Tongliang Liu
Chao Shen
AAMLMQ
130
14
0
10 May 2023
Sharpness-Aware Minimization Alone can Improve Adversarial Robustness
Sharpness-Aware Minimization Alone can Improve Adversarial Robustness
Zeming Wei
Jingyu Zhu
Yihao Zhang
AAML
86
11
0
09 May 2023
Model-agnostic Measure of Generalization Difficulty
Model-agnostic Measure of Generalization Difficulty
Akhilan Boopathy
Kevin Liu
Jaedong Hwang
Shu Ge
Asaad Mohammedsaleh
Ila Fiete
129
4
0
01 May 2023
An Adaptive Policy to Employ Sharpness-Aware Minimization
An Adaptive Policy to Employ Sharpness-Aware Minimization
Weisen Jiang
Hansi Yang
Yu Zhang
James T. Kwok
AAML
128
34
0
28 Apr 2023
Pre-processing training data improves accuracy and generalisability of
  convolutional neural network based landscape semantic segmentation
Pre-processing training data improves accuracy and generalisability of convolutional neural network based landscape semantic segmentation
A. Clark
S. Phinn
P. Scarth
22
3
0
28 Apr 2023
More Communication Does Not Result in Smaller Generalization Error in
  Federated Learning
More Communication Does Not Result in Smaller Generalization Error in Federated Learning
Abdellatif Zaidi
Romain Chor
Milad Sefidgaran
FedMLAI4CE
90
10
0
24 Apr 2023
Hierarchical Weight Averaging for Deep Neural Networks
Hierarchical Weight Averaging for Deep Neural Networks
Xiaozhe Gu
Zixun Zhang
Yuncheng Jiang
Yaoyu Zhang
Ruimao Zhang
Shuguang Cui
Zhuguo Li
52
5
0
23 Apr 2023
Do deep neural networks have an inbuilt Occam's razor?
Do deep neural networks have an inbuilt Occam's razor?
Chris Mingard
Henry Rees
Guillermo Valle Pérez
A. Louis
UQCVBDL
62
16
0
13 Apr 2023
On Efficient Training of Large-Scale Deep Learning Models: A Literature
  Review
On Efficient Training of Large-Scale Deep Learning Models: A Literature Review
Li Shen
Yan Sun
Zhiyuan Yu
Liang Ding
Xinmei Tian
Dacheng Tao
VLM
105
43
0
07 Apr 2023
On the Pareto Front of Multilingual Neural Machine Translation
On the Pareto Front of Multilingual Neural Machine Translation
Liang Chen
Shuming Ma
Dongdong Zhang
Furu Wei
Baobao Chang
MoE
79
5
0
06 Apr 2023
Previous
123...8910...303132
Next