ResearchTrend.AI
  • Papers
  • Communities
  • Organizations
  • Events
  • Blog
  • Pricing
  • Feedback
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1609.04836
  4. Cited By
On Large-Batch Training for Deep Learning: Generalization Gap and Sharp
  Minima
v1v2 (latest)

On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima

15 September 2016
N. Keskar
Dheevatsa Mudigere
J. Nocedal
M. Smelyanskiy
P. T. P. Tang
    ODL
ArXiv (abs)PDFHTML

Papers citing "On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima"

50 / 1,585 papers shown
Title
Hidden Activations Are Not Enough: A General Approach to Neural Network
  Predictions
Hidden Activations Are Not Enough: A General Approach to Neural Network Predictions
Samuel Leblanc
Aiky Rasolomanana
Marco Armenta
106
0
0
20 Sep 2024
Efficient Training of Deep Neural Operator Networks via Randomized Sampling
Efficient Training of Deep Neural Operator Networks via Randomized Sampling
Sharmila Karumuri
Lori Graham-Brady
Somdatta Goswami
105
2
0
20 Sep 2024
Convergence of Sharpness-Aware Minimization Algorithms using Increasing
  Batch Size and Decaying Learning Rate
Convergence of Sharpness-Aware Minimization Algorithms using Increasing Batch Size and Decaying Learning Rate
Hinata Harada
Hideaki Iiduka
92
1
0
16 Sep 2024
WaterMAS: Sharpness-Aware Maximization for Neural Network Watermarking
WaterMAS: Sharpness-Aware Maximization for Neural Network Watermarking
Carl De Sousa Trias
Mihai P. Mitrea
Attilio Fiandrotti
Marco Cagnazzo
Sumanta Chaudhuri
Enzo Tartaglione
AAML
85
1
0
05 Sep 2024
Improving Robustness to Multiple Spurious Correlations by
  Multi-Objective Optimization
Improving Robustness to Multiple Spurious Correlations by Multi-Objective Optimization
Nayeong Kim
Juwon Kang
Sungsoo Ahn
Jungseul Ok
Suha Kwak
96
1
0
05 Sep 2024
CLIBE: Detecting Dynamic Backdoors in Transformer-based NLP Models
CLIBE: Detecting Dynamic Backdoors in Transformer-based NLP Models
Rui Zeng
Xi Chen
Yuwen Pu
Xuhong Zhang
Tianyu Du
Shouling Ji
145
11
0
02 Sep 2024
Fisher Information guided Purification against Backdoor Attacks
Fisher Information guided Purification against Backdoor Attacks
Nazmul Karim
Abdullah Al Arafat
Adnan Siraj Rakin
Zhishan Guo
Nazanin Rahnavard
AAML
144
3
0
01 Sep 2024
Deep Learning to Predict Late-Onset Breast Cancer Metastasis: the Single
  Hyperparameter Grid Search (SHGS) Strategy for Meta Tuning Concerning Deep
  Feed-forward Neural Network
Deep Learning to Predict Late-Onset Breast Cancer Metastasis: the Single Hyperparameter Grid Search (SHGS) Strategy for Meta Tuning Concerning Deep Feed-forward Neural Network
Yijun Zhou
Om Arora-Jain
Xia Jiang
OOD
102
3
0
28 Aug 2024
Can Optimization Trajectories Explain Multi-Task Transfer?
Can Optimization Trajectories Explain Multi-Task Transfer?
David Mueller
Mark Dredze
Nicholas Andrews
206
2
0
26 Aug 2024
Weight Scope Alignment: A Frustratingly Easy Method for Model Merging
Weight Scope Alignment: A Frustratingly Easy Method for Model Merging
Yichu Xu
Xin-Chun Li
Le Gan
De-Chuan Zhan
MoMe
125
0
0
22 Aug 2024
A Noncontact Technique for Wave Measurement Based on Thermal
  Stereography and Deep Learning
A Noncontact Technique for Wave Measurement Based on Thermal Stereography and Deep Learning
Deyu Li
L. Xiao
Handi Wei
Yan Li
Binghua Zhang
84
0
0
20 Aug 2024
Enhancing Adversarial Transferability with Adversarial Weight Tuning
Enhancing Adversarial Transferability with Adversarial Weight Tuning
Jiahao Chen
Zhou Feng
Rui Zeng
Yuwen Pu
Chunyi Zhou
Yi Jiang
Yuyou Gan
Jinbao Li
Shouling Ji
AAML
134
3
0
18 Aug 2024
Information-Theoretic Progress Measures reveal Grokking is an Emergent
  Phase Transition
Information-Theoretic Progress Measures reveal Grokking is an Emergent Phase Transition
Kenzo Clauw
S. Stramaglia
Daniele Marinazzo
92
6
0
16 Aug 2024
Rubick: Exploiting Job Reconfigurability for Deep Learning Cluster
  Scheduling
Rubick: Exploiting Job Reconfigurability for Deep Learning Cluster Scheduling
Xinyi Zhang
Hanyu Zhao
Wencong Xiao
Chencan Wu
Fei Xu
Yong Li
Wei Lin
Fangming Liu
77
2
0
16 Aug 2024
Enhancing Sharpness-Aware Minimization by Learning Perturbation Radius
Enhancing Sharpness-Aware Minimization by Learning Perturbation Radius
Xuehao Wang
Weisen Jiang
Shuai Fu
Yu Zhang
AAML
117
1
0
15 Aug 2024
Implicit Neural Representation For Accurate CFD Flow Field Prediction
Implicit Neural Representation For Accurate CFD Flow Field Prediction
L. D. Vito
Nils Pinnau
Simone Dey
AI4CE
136
1
0
12 Aug 2024
Do Sharpness-based Optimizers Improve Generalization in Medical Image
  Analysis?
Do Sharpness-based Optimizers Improve Generalization in Medical Image Analysis?
Mohamed Hassan
Aleksandar Vakanski
Min Xian
AAMLMedIm
133
1
0
07 Aug 2024
Exploring Loss Landscapes through the Lens of Spin Glass Theory
Exploring Loss Landscapes through the Lens of Spin Glass Theory
Hao Liao
Wei Zhang
Zhanyi Huang
Zexiao Long
Mingyang Zhou
Xiaoqun Wu
Rui Mao
Chi Ho Yeung
122
2
0
30 Jul 2024
Characterizing Dynamical Stability of Stochastic Gradient Descent in Overparameterized Learning
Characterizing Dynamical Stability of Stochastic Gradient Descent in Overparameterized Learning
Dennis Chemnitz
Maximilian Engel
114
1
0
29 Jul 2024
Local vs Global continual learning
Local vs Global continual learning
Giulia Lanzillotta
Sidak Pal Singh
Benjamin Grewe
Thomas Hofmann
CLL
113
0
0
23 Jul 2024
Sharpness-diversity tradeoff: improving flat ensembles with SharpBalance
Sharpness-diversity tradeoff: improving flat ensembles with SharpBalance
Haiquan Lu
Xiaotian Liu
Yefan Zhou
Qunli Li
Kurt Keutzer
Michael W. Mahoney
Yujun Yan
Huanrui Yang
Yaoqing Yang
88
1
0
17 Jul 2024
Overcoming Catastrophic Forgetting in Federated Class-Incremental
  Learning via Federated Global Twin Generator
Overcoming Catastrophic Forgetting in Federated Class-Incremental Learning via Federated Global Twin Generator
Thinh Nguyen
Khoa D. Doan
Binh T. Nguyen
Danh Le-Phuoc
Kok-Seng Wong
FedML
101
0
0
13 Jul 2024
Harmony in Diversity: Merging Neural Networks with Canonical Correlation
  Analysis
Harmony in Diversity: Merging Neural Networks with Canonical Correlation Analysis
Stefan Horoi
Albert Manuel Orozco Camacho
Eugene Belilovsky
Guy Wolf
FedMLMoMe
106
10
0
07 Jul 2024
Multimodal Classification via Modal-Aware Interactive Enhancement
Multimodal Classification via Modal-Aware Interactive Enhancement
Qing-Yuan Jiang
Zhouyang Chi
Yang Yang
93
3
0
05 Jul 2024
Simplifying Deep Temporal Difference Learning
Simplifying Deep Temporal Difference Learning
Matteo Gallici
Mattie Fellows
Benjamin Ellis
B. Pou
Ivan Masmitja
Jakob Foerster
Mario Martin
OffRL
251
39
0
05 Jul 2024
PaSE: Parallelization Strategies for Efficient DNN Training
PaSE: Parallelization Strategies for Efficient DNN Training
Venmugil Elango
71
11
0
04 Jul 2024
Bias of Stochastic Gradient Descent or the Architecture: Disentangling the Effects of Overparameterization of Neural Networks
Bias of Stochastic Gradient Descent or the Architecture: Disentangling the Effects of Overparameterization of Neural Networks
Amit Peleg
Matthias Hein
139
0
0
04 Jul 2024
Curvature Clues: Decoding Deep Learning Privacy with Input Loss
  Curvature
Curvature Clues: Decoding Deep Learning Privacy with Input Loss Curvature
Deepak Ravikumar
Efstathia Soufleri
Kaushik Roy
95
2
0
03 Jul 2024
Enhancing Accuracy and Parameter-Efficiency of Neural Representations
  for Network Parameterization
Enhancing Accuracy and Parameter-Efficiency of Neural Representations for Network Parameterization
Hongjun Choi
Jayaraman J. Thiagarajan
Ruben Glatt
Shusen Liu
119
1
0
29 Jun 2024
On the Trade-off between Flatness and Optimization in Distributed Learning
On the Trade-off between Flatness and Optimization in Distributed Learning
Ying Cao
Zhaoxian Wu
Kun Yuan
Ali H. Sayed
148
3
0
28 Jun 2024
On Scaling Up 3D Gaussian Splatting Training
On Scaling Up 3D Gaussian Splatting Training
Hexu Zhao
Haoyang Weng
Daohan Lu
Ang Li
Jinyang Li
Aurojit Panda
Saining Xie
3DGS
97
23
0
26 Jun 2024
MAGIC: Meta-Ability Guided Interactive Chain-of-Distillation for
  Effective-and-Efficient Vision-and-Language Navigation
MAGIC: Meta-Ability Guided Interactive Chain-of-Distillation for Effective-and-Efficient Vision-and-Language Navigation
Liuyi Wang
Zongtao He
Mengjiao Shen
Jingwei Yang
Chengju Liu
Qijun Chen
VLM
137
3
0
25 Jun 2024
Improving robustness to corruptions with multiplicative weight
  perturbations
Improving robustness to corruptions with multiplicative weight perturbations
Trung Trinh
Markus Heinonen
Luigi Acerbi
Samuel Kaski
107
1
0
24 Jun 2024
MD tree: a model-diagnostic tree grown on loss landscape
MD tree: a model-diagnostic tree grown on loss landscape
Yefan Zhou
Jianlong Chen
Qinxue Cao
Konstantin Schürholt
Yaoqing Yang
143
2
0
24 Jun 2024
Effect of Random Learning Rate: Theoretical Analysis of SGD Dynamics in Non-Convex Optimization via Stationary Distribution
Effect of Random Learning Rate: Theoretical Analysis of SGD Dynamics in Non-Convex Optimization via Stationary Distribution
Naoki Yoshida
Shogo H. Nakakita
Masaaki Imaizumi
89
1
0
23 Jun 2024
DataFreeShield: Defending Adversarial Attacks without Training Data
DataFreeShield: Defending Adversarial Attacks without Training Data
Hyeyoon Lee
Kanghyun Choi
Dain Kwon
Sunjong Park
Mayoore S. Jaiswal
Noseong Park
Jonghyun Choi
Jinho Lee
109
0
0
21 Jun 2024
Flat Posterior Does Matter For Bayesian Model Averaging
Flat Posterior Does Matter For Bayesian Model Averaging
Sungjun Lim
Jeyoon Yeom
Sooyon Kim
Hoyoon Byun
Jinho Kang
Yohan Jung
Jiyoung Jung
Kyungwoo Song
BDLAAML
241
0
0
21 Jun 2024
Adaptive Adversarial Cross-Entropy Loss for Sharpness-Aware Minimization
Adaptive Adversarial Cross-Entropy Loss for Sharpness-Aware Minimization
Tanapat Ratchatorn
Masayuki Tanaka
AAML
127
1
0
20 Jun 2024
Information Guided Regularization for Fine-tuning Language Models
Information Guided Regularization for Fine-tuning Language Models
Mandar Sharma
Nikhil Muralidhar
Shengzhe Xu
Raquib Bin Yousuf
Naren Ramakrishnan
141
0
0
20 Jun 2024
Communication-Efficient Adaptive Batch Size Strategies for Distributed
  Local Gradient Methods
Communication-Efficient Adaptive Batch Size Strategies for Distributed Local Gradient Methods
Tim Tsz-Kit Lau
Weijian Li
Chenwei Xu
Han Liu
Mladen Kolar
118
3
0
20 Jun 2024
DPO: Dual-Perturbation Optimization for Test-time Adaptation in 3D
  Object Detection
DPO: Dual-Perturbation Optimization for Test-time Adaptation in 3D Object Detection
Zhuoxiao Chen
Zixin Wang
Yadan Luo
Sen Wang
Zi Huang
AAML3DPC
86
2
0
19 Jun 2024
Low-Resource Machine Translation through the Lens of Personalized
  Federated Learning
Low-Resource Machine Translation through the Lens of Personalized Federated Learning
Viktor Moskvoretskii
N. Tupitsa
Chris Biemann
Samuel Horváth
Eduard A. Gorbunov
Irina Nikishina
FedML
110
0
0
18 Jun 2024
How Neural Networks Learn the Support is an Implicit Regularization
  Effect of SGD
How Neural Networks Learn the Support is an Implicit Regularization Effect of SGD
Pierfrancesco Beneventano
Andrea Pinto
Tomaso A. Poggio
MLT
96
2
0
17 Jun 2024
What Does Softmax Probability Tell Us about Classifiers Ranking Across
  Diverse Test Conditions?
What Does Softmax Probability Tell Us about Classifiers Ranking Across Diverse Test Conditions?
Weijie Tu
Weijian Deng
Liang Zheng
Tom Gedeon
124
1
0
14 Jun 2024
When Will Gradient Regularization Be Harmful?
When Will Gradient Regularization Be Harmful?
Yang Zhao
Hao Zhang
Xiuyuan Hu
AI4CE
93
2
0
14 Jun 2024
Large Stepsize Gradient Descent for Non-Homogeneous Two-Layer Networks:
  Margin Improvement and Fast Optimization
Large Stepsize Gradient Descent for Non-Homogeneous Two-Layer Networks: Margin Improvement and Fast Optimization
Yuhang Cai
Jingfeng Wu
Song Mei
Michael Lindsey
Peter L. Bartlett
125
5
0
12 Jun 2024
Probing Implicit Bias in Semi-gradient Q-learning: Visualizing the
  Effective Loss Landscapes via the Fokker--Planck Equation
Probing Implicit Bias in Semi-gradient Q-learning: Visualizing the Effective Loss Landscapes via the Fokker--Planck Equation
Shuyu Yin
Fei Wen
Peilin Liu
Tao Luo
87
0
0
12 Jun 2024
Asymptotic Unbiased Sample Sampling to Speed Up Sharpness-Aware Minimization
Asymptotic Unbiased Sample Sampling to Speed Up Sharpness-Aware Minimization
Jiaxin Deng
Junbiao Pang
Baochang Zhang
196
2
0
12 Jun 2024
Agnostic Sharpness-Aware Minimization
Agnostic Sharpness-Aware Minimization
Van-Anh Nguyen
Quyen Tran
Tuan Truong
Thanh-Toan Do
Dinh Q. Phung
Trung Le
136
0
0
11 Jun 2024
Stable Minima Cannot Overfit in Univariate ReLU Networks: Generalization
  by Large Step Sizes
Stable Minima Cannot Overfit in Univariate ReLU Networks: Generalization by Large Step Sizes
Dan Qiao
Kaiqi Zhang
Esha Singh
Daniel Soudry
Yu-Xiang Wang
NoLa
109
4
0
10 Jun 2024
Previous
12345...303132
Next