Papers
Communities
Organizations
Events
Blog
Pricing
Feedback
Contact Sales
Search
Open menu
Home
Papers
1609.04836
Cited By
v1
v2 (latest)
On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima
15 September 2016
N. Keskar
Dheevatsa Mudigere
J. Nocedal
M. Smelyanskiy
P. T. P. Tang
ODL
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima"
50 / 1,585 papers shown
Title
Hidden Activations Are Not Enough: A General Approach to Neural Network Predictions
Samuel Leblanc
Aiky Rasolomanana
Marco Armenta
106
0
0
20 Sep 2024
Efficient Training of Deep Neural Operator Networks via Randomized Sampling
Sharmila Karumuri
Lori Graham-Brady
Somdatta Goswami
105
2
0
20 Sep 2024
Convergence of Sharpness-Aware Minimization Algorithms using Increasing Batch Size and Decaying Learning Rate
Hinata Harada
Hideaki Iiduka
92
1
0
16 Sep 2024
WaterMAS: Sharpness-Aware Maximization for Neural Network Watermarking
Carl De Sousa Trias
Mihai P. Mitrea
Attilio Fiandrotti
Marco Cagnazzo
Sumanta Chaudhuri
Enzo Tartaglione
AAML
85
1
0
05 Sep 2024
Improving Robustness to Multiple Spurious Correlations by Multi-Objective Optimization
Nayeong Kim
Juwon Kang
Sungsoo Ahn
Jungseul Ok
Suha Kwak
96
1
0
05 Sep 2024
CLIBE: Detecting Dynamic Backdoors in Transformer-based NLP Models
Rui Zeng
Xi Chen
Yuwen Pu
Xuhong Zhang
Tianyu Du
Shouling Ji
145
11
0
02 Sep 2024
Fisher Information guided Purification against Backdoor Attacks
Nazmul Karim
Abdullah Al Arafat
Adnan Siraj Rakin
Zhishan Guo
Nazanin Rahnavard
AAML
144
3
0
01 Sep 2024
Deep Learning to Predict Late-Onset Breast Cancer Metastasis: the Single Hyperparameter Grid Search (SHGS) Strategy for Meta Tuning Concerning Deep Feed-forward Neural Network
Yijun Zhou
Om Arora-Jain
Xia Jiang
OOD
102
3
0
28 Aug 2024
Can Optimization Trajectories Explain Multi-Task Transfer?
David Mueller
Mark Dredze
Nicholas Andrews
206
2
0
26 Aug 2024
Weight Scope Alignment: A Frustratingly Easy Method for Model Merging
Yichu Xu
Xin-Chun Li
Le Gan
De-Chuan Zhan
MoMe
125
0
0
22 Aug 2024
A Noncontact Technique for Wave Measurement Based on Thermal Stereography and Deep Learning
Deyu Li
L. Xiao
Handi Wei
Yan Li
Binghua Zhang
84
0
0
20 Aug 2024
Enhancing Adversarial Transferability with Adversarial Weight Tuning
Jiahao Chen
Zhou Feng
Rui Zeng
Yuwen Pu
Chunyi Zhou
Yi Jiang
Yuyou Gan
Jinbao Li
Shouling Ji
AAML
134
3
0
18 Aug 2024
Information-Theoretic Progress Measures reveal Grokking is an Emergent Phase Transition
Kenzo Clauw
S. Stramaglia
Daniele Marinazzo
92
6
0
16 Aug 2024
Rubick: Exploiting Job Reconfigurability for Deep Learning Cluster Scheduling
Xinyi Zhang
Hanyu Zhao
Wencong Xiao
Chencan Wu
Fei Xu
Yong Li
Wei Lin
Fangming Liu
77
2
0
16 Aug 2024
Enhancing Sharpness-Aware Minimization by Learning Perturbation Radius
Xuehao Wang
Weisen Jiang
Shuai Fu
Yu Zhang
AAML
117
1
0
15 Aug 2024
Implicit Neural Representation For Accurate CFD Flow Field Prediction
L. D. Vito
Nils Pinnau
Simone Dey
AI4CE
136
1
0
12 Aug 2024
Do Sharpness-based Optimizers Improve Generalization in Medical Image Analysis?
Mohamed Hassan
Aleksandar Vakanski
Min Xian
AAML
MedIm
133
1
0
07 Aug 2024
Exploring Loss Landscapes through the Lens of Spin Glass Theory
Hao Liao
Wei Zhang
Zhanyi Huang
Zexiao Long
Mingyang Zhou
Xiaoqun Wu
Rui Mao
Chi Ho Yeung
122
2
0
30 Jul 2024
Characterizing Dynamical Stability of Stochastic Gradient Descent in Overparameterized Learning
Dennis Chemnitz
Maximilian Engel
114
1
0
29 Jul 2024
Local vs Global continual learning
Giulia Lanzillotta
Sidak Pal Singh
Benjamin Grewe
Thomas Hofmann
CLL
113
0
0
23 Jul 2024
Sharpness-diversity tradeoff: improving flat ensembles with SharpBalance
Haiquan Lu
Xiaotian Liu
Yefan Zhou
Qunli Li
Kurt Keutzer
Michael W. Mahoney
Yujun Yan
Huanrui Yang
Yaoqing Yang
88
1
0
17 Jul 2024
Overcoming Catastrophic Forgetting in Federated Class-Incremental Learning via Federated Global Twin Generator
Thinh Nguyen
Khoa D. Doan
Binh T. Nguyen
Danh Le-Phuoc
Kok-Seng Wong
FedML
101
0
0
13 Jul 2024
Harmony in Diversity: Merging Neural Networks with Canonical Correlation Analysis
Stefan Horoi
Albert Manuel Orozco Camacho
Eugene Belilovsky
Guy Wolf
FedML
MoMe
106
10
0
07 Jul 2024
Multimodal Classification via Modal-Aware Interactive Enhancement
Qing-Yuan Jiang
Zhouyang Chi
Yang Yang
93
3
0
05 Jul 2024
Simplifying Deep Temporal Difference Learning
Matteo Gallici
Mattie Fellows
Benjamin Ellis
B. Pou
Ivan Masmitja
Jakob Foerster
Mario Martin
OffRL
251
39
0
05 Jul 2024
PaSE: Parallelization Strategies for Efficient DNN Training
Venmugil Elango
71
11
0
04 Jul 2024
Bias of Stochastic Gradient Descent or the Architecture: Disentangling the Effects of Overparameterization of Neural Networks
Amit Peleg
Matthias Hein
139
0
0
04 Jul 2024
Curvature Clues: Decoding Deep Learning Privacy with Input Loss Curvature
Deepak Ravikumar
Efstathia Soufleri
Kaushik Roy
95
2
0
03 Jul 2024
Enhancing Accuracy and Parameter-Efficiency of Neural Representations for Network Parameterization
Hongjun Choi
Jayaraman J. Thiagarajan
Ruben Glatt
Shusen Liu
119
1
0
29 Jun 2024
On the Trade-off between Flatness and Optimization in Distributed Learning
Ying Cao
Zhaoxian Wu
Kun Yuan
Ali H. Sayed
148
3
0
28 Jun 2024
On Scaling Up 3D Gaussian Splatting Training
Hexu Zhao
Haoyang Weng
Daohan Lu
Ang Li
Jinyang Li
Aurojit Panda
Saining Xie
3DGS
97
23
0
26 Jun 2024
MAGIC: Meta-Ability Guided Interactive Chain-of-Distillation for Effective-and-Efficient Vision-and-Language Navigation
Liuyi Wang
Zongtao He
Mengjiao Shen
Jingwei Yang
Chengju Liu
Qijun Chen
VLM
137
3
0
25 Jun 2024
Improving robustness to corruptions with multiplicative weight perturbations
Trung Trinh
Markus Heinonen
Luigi Acerbi
Samuel Kaski
107
1
0
24 Jun 2024
MD tree: a model-diagnostic tree grown on loss landscape
Yefan Zhou
Jianlong Chen
Qinxue Cao
Konstantin Schürholt
Yaoqing Yang
143
2
0
24 Jun 2024
Effect of Random Learning Rate: Theoretical Analysis of SGD Dynamics in Non-Convex Optimization via Stationary Distribution
Naoki Yoshida
Shogo H. Nakakita
Masaaki Imaizumi
89
1
0
23 Jun 2024
DataFreeShield: Defending Adversarial Attacks without Training Data
Hyeyoon Lee
Kanghyun Choi
Dain Kwon
Sunjong Park
Mayoore S. Jaiswal
Noseong Park
Jonghyun Choi
Jinho Lee
109
0
0
21 Jun 2024
Flat Posterior Does Matter For Bayesian Model Averaging
Sungjun Lim
Jeyoon Yeom
Sooyon Kim
Hoyoon Byun
Jinho Kang
Yohan Jung
Jiyoung Jung
Kyungwoo Song
BDL
AAML
241
0
0
21 Jun 2024
Adaptive Adversarial Cross-Entropy Loss for Sharpness-Aware Minimization
Tanapat Ratchatorn
Masayuki Tanaka
AAML
127
1
0
20 Jun 2024
Information Guided Regularization for Fine-tuning Language Models
Mandar Sharma
Nikhil Muralidhar
Shengzhe Xu
Raquib Bin Yousuf
Naren Ramakrishnan
141
0
0
20 Jun 2024
Communication-Efficient Adaptive Batch Size Strategies for Distributed Local Gradient Methods
Tim Tsz-Kit Lau
Weijian Li
Chenwei Xu
Han Liu
Mladen Kolar
118
3
0
20 Jun 2024
DPO: Dual-Perturbation Optimization for Test-time Adaptation in 3D Object Detection
Zhuoxiao Chen
Zixin Wang
Yadan Luo
Sen Wang
Zi Huang
AAML
3DPC
86
2
0
19 Jun 2024
Low-Resource Machine Translation through the Lens of Personalized Federated Learning
Viktor Moskvoretskii
N. Tupitsa
Chris Biemann
Samuel Horváth
Eduard A. Gorbunov
Irina Nikishina
FedML
110
0
0
18 Jun 2024
How Neural Networks Learn the Support is an Implicit Regularization Effect of SGD
Pierfrancesco Beneventano
Andrea Pinto
Tomaso A. Poggio
MLT
96
2
0
17 Jun 2024
What Does Softmax Probability Tell Us about Classifiers Ranking Across Diverse Test Conditions?
Weijie Tu
Weijian Deng
Liang Zheng
Tom Gedeon
124
1
0
14 Jun 2024
When Will Gradient Regularization Be Harmful?
Yang Zhao
Hao Zhang
Xiuyuan Hu
AI4CE
93
2
0
14 Jun 2024
Large Stepsize Gradient Descent for Non-Homogeneous Two-Layer Networks: Margin Improvement and Fast Optimization
Yuhang Cai
Jingfeng Wu
Song Mei
Michael Lindsey
Peter L. Bartlett
125
5
0
12 Jun 2024
Probing Implicit Bias in Semi-gradient Q-learning: Visualizing the Effective Loss Landscapes via the Fokker--Planck Equation
Shuyu Yin
Fei Wen
Peilin Liu
Tao Luo
87
0
0
12 Jun 2024
Asymptotic Unbiased Sample Sampling to Speed Up Sharpness-Aware Minimization
Jiaxin Deng
Junbiao Pang
Baochang Zhang
196
2
0
12 Jun 2024
Agnostic Sharpness-Aware Minimization
Van-Anh Nguyen
Quyen Tran
Tuan Truong
Thanh-Toan Do
Dinh Q. Phung
Trung Le
136
0
0
11 Jun 2024
Stable Minima Cannot Overfit in Univariate ReLU Networks: Generalization by Large Step Sizes
Dan Qiao
Kaiqi Zhang
Esha Singh
Daniel Soudry
Yu-Xiang Wang
NoLa
109
4
0
10 Jun 2024
Previous
1
2
3
4
5
...
30
31
32
Next