The Crucial Role of Normalization in Sharpness-Aware Minimization

24 May 2023

Papers citing "The Crucial Role of Normalization in Sharpness-Aware Minimization"

16 / 16 papers shown

Title
From Gradient Clipping to Normalization for Heavy Tailed SGD Florian Hübler Ilyas Fatkhullin Niao He 37 5 0 17 Oct 2024
Does SGD really happen in tiny subspaces? Minhak Song Kwangjun Ahn Chulhee Yun 47 4 1 25 May 2024
Momentum-SAM: Sharpness Aware Minimization without Computational Overhead Marlon Becker Frederick Altrock Benjamin Risse 68 5 0 22 Jan 2024
Critical Influence of Overparameterization on Sharpness-aware Minimization Sungbin Shin Dongyeop Lee Maksym Andriushchenko Namhoon Lee AAML 36 1 0 29 Nov 2023
Sharpness-Aware Minimization and the Edge of Stability Philip M. Long Peter L. Bartlett AAML 22 9 0 21 Sep 2023
Practical Sharpness-Aware Minimization Cannot Converge All the Way to Optima Dongkuk Si Chulhee Yun 26 15 0 16 Jun 2023
How to escape sharp minima with random perturbations Kwangjun Ahn Ali Jadbabaie S. Sra ODL 19 6 0 25 May 2023
How Do Transformers Learn Topic Structure: Towards a Mechanistic Understanding Yuchen Li Yuan-Fang Li Andrej Risteski 98 61 0 07 Mar 2023
mSAM: Micro-Batch-Averaged Sharpness-Aware Minimization Kayhan Behdin Qingquan Song Aman Gupta S. Keerthi Ayan Acharya Borja Ocejo Gregory Dexter Rajiv Khanna D. Durfee Rahul Mazumder AAML 13 7 0 19 Feb 2023
Learning threshold neurons via the "edge of stability" Kwangjun Ahn Sébastien Bubeck Sinho Chewi Y. Lee Felipe Suarez Yi Zhang MLT 28 36 0 14 Dec 2022
The Dynamics of Sharpness-Aware Minimization: Bouncing Across Ravines and Drifting Towards Wide Minima Peter L. Bartlett Philip M. Long Olivier Bousquet 60 34 0 04 Oct 2022
Understanding Gradient Descent on Edge of Stability in Deep Learning Sanjeev Arora Zhiyuan Li A. Panigrahi MLT 72 88 0 19 May 2022
Sharpness-Aware Minimization Improves Language Model Generalization Dara Bahri H. Mobahi Yi Tay 119 82 0 16 Oct 2021
Efficient Sharpness-aware Minimization for Improved Training of Neural Networks Jiawei Du Hanshu Yan Jiashi Feng Joey Tianyi Zhou Liangli Zhen Rick Siow Mong Goh Vincent Y. F. Tan AAML 99 132 0 07 Oct 2021
On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima N. Keskar Dheevatsa Mudigere J. Nocedal M. Smelyanskiy P. T. P. Tang ODL 273 2,696 0 15 Sep 2016
Linear Convergence of Gradient and Proximal-Gradient Methods Under the Polyak-Łojasiewicz Condition Hamed Karimi J. Nutini Mark W. Schmidt 119 1,190 0 16 Aug 2016