Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2110.08529
Cited By
Sharpness-Aware Minimization Improves Language Model Generalization
16 October 2021
Dara Bahri
H. Mobahi
Yi Tay
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Sharpness-Aware Minimization Improves Language Model Generalization"
21 / 21 papers shown
Title
Elucidating the Design Space of Dataset Condensation
Shitong Shao
Zikai Zhou
Huanran Chen
Zhiqiang Shen
DD
54
7
0
20 Jan 2025
Seeking Consistent Flat Minima for Better Domain Generalization via Refining Loss Landscapes
Aodi Li
Liansheng Zhuang
Xiao Long
Minghong Yao
Shafei Wang
91
0
0
18 Dec 2024
Do Sharpness-based Optimizers Improve Generalization in Medical Image Analysis?
Mohamed Hassan
Aleksandar Vakanski
Min Xian
AAML
MedIm
30
1
0
07 Aug 2024
Asymptotic Unbiased Sample Sampling to Speed Up Sharpness-Aware Minimization
Jiaxin Deng
Junbiao Pang
Baochang Zhang
64
1
0
12 Jun 2024
Agnostic Sharpness-Aware Minimization
Van-Anh Nguyen
Quyen Tran
Tuan Truong
Thanh-Toan Do
Dinh Q. Phung
Trung Le
32
0
0
11 Jun 2024
A Hybrid Generative and Discriminative PointNet on Unordered Point Sets
Yang Ye
Shihao Ji
PINN
3DPC
22
0
0
19 Apr 2024
Flatness Improves Backbone Generalisation in Few-shot Classification
Rui Li
Martin Trapp
Marcus Klasson
Arno Solin
33
0
0
11 Apr 2024
From Robustness to Improved Generalization and Calibration in Pre-trained Language Models
Josip Jukić
Jan Snajder
18
0
0
31 Mar 2024
Momentum-SAM: Sharpness Aware Minimization without Computational Overhead
Marlon Becker
Frederick Altrock
Benjamin Risse
65
5
0
22 Jan 2024
Critical Influence of Overparameterization on Sharpness-aware Minimization
Sungbin Shin
Dongyeop Lee
Maksym Andriushchenko
Namhoon Lee
AAML
36
1
0
29 Nov 2023
Practical Sharpness-Aware Minimization Cannot Converge All the Way to Optima
Dongkuk Si
Chulhee Yun
26
15
0
16 Jun 2023
How to escape sharp minima with random perturbations
Kwangjun Ahn
Ali Jadbabaie
S. Sra
ODL
16
6
0
25 May 2023
mSAM: Micro-Batch-Averaged Sharpness-Aware Minimization
Kayhan Behdin
Qingquan Song
Aman Gupta
S. Keerthi
Ayan Acharya
Borja Ocejo
Gregory Dexter
Rajiv Khanna
D. Durfee
Rahul Mazumder
AAML
10
7
0
19 Feb 2023
SAM operates far from home: eigenvalue regularization as a dynamical phenomenon
Atish Agarwala
Yann N. Dauphin
11
20
0
17 Feb 2023
The Dynamics of Sharpness-Aware Minimization: Bouncing Across Ravines and Drifting Towards Wide Minima
Peter L. Bartlett
Philip M. Long
Olivier Bousquet
57
34
0
04 Oct 2022
Sharpness-Aware Minimization with Dynamic Reweighting
Wenxuan Zhou
Fangyu Liu
Huan Zhang
Muhao Chen
AAML
19
8
0
16 Dec 2021
MLP-Mixer: An all-MLP Architecture for Vision
Ilya O. Tolstikhin
N. Houlsby
Alexander Kolesnikov
Lucas Beyer
Xiaohua Zhai
...
Andreas Steiner
Daniel Keysers
Jakob Uszkoreit
Mario Lucic
Alexey Dosovitskiy
239
2,554
0
04 May 2021
High-Performance Large-Scale Image Recognition Without Normalization
Andrew Brock
Soham De
Samuel L. Smith
Karen Simonyan
VLM
220
450
0
11 Feb 2021
FreeLB: Enhanced Adversarial Training for Natural Language Understanding
Chen Zhu
Yu Cheng
Zhe Gan
S. Sun
Tom Goldstein
Jingjing Liu
AAML
205
430
0
25 Sep 2019
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding
Alex Jinpeng Wang
Amanpreet Singh
Julian Michael
Felix Hill
Omer Levy
Samuel R. Bowman
ELM
294
6,927
0
20 Apr 2018
On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima
N. Keskar
Dheevatsa Mudigere
J. Nocedal
M. Smelyanskiy
P. T. P. Tang
ODL
273
2,696
0
15 Sep 2016
1