Asymmetric Valleys: Beyond Sharp and Flat Local Minima

2 February 2019

Gao Huang

Papers citing "Asymmetric Valleys: Beyond Sharp and Flat Local Minima"

32 / 32 papers shown

Title
Does SGD really happen in tiny subspaces? Minhak Song Kwangjun Ahn Chulhee Yun 66 4 1 25 May 2024
No Train No Gain: Revisiting Efficient Training Algorithms For Transformer-based Language Models Jean Kaddour Oscar Key Piotr Nawrot Pasquale Minervini Matt J. Kusner 20 41 0 12 Jul 2023
SING: A Plug-and-Play DNN Learning Technique Adrien Courtois Damien Scieur Jean-Michel Morel Pablo Arias Thomas Eboli 22 0 0 25 May 2023
How to escape sharp minima with random perturbations Kwangjun Ahn Ali Jadbabaie S. Sra ODL 29 6 0 25 May 2023
GeNAS: Neural Architecture Search with Better Generalization Joonhyun Jeong Joonsang Yu Geondo Park Dongyoon Han Y. Yoo 25 4 0 15 May 2023
An Adaptive Policy to Employ Sharpness-Aware Minimization Weisen Jiang Hansi Yang Yu Zhang James T. Kwok AAML 81 31 0 28 Apr 2023
Revisiting the Noise Model of Stochastic Gradient Descent Barak Battash Ofir Lindenbaum 27 9 0 05 Mar 2023
DiTTO: A Feature Representation Imitation Approach for Improving Cross-Lingual Transfer Shanu Kumar Abbaraju Soujanya Sandipan Dandapat Sunayana Sitaram Monojit Choudhury VLM 25 1 0 04 Mar 2023
Exploring the Effect of Multi-step Ascent in Sharpness-Aware Minimization Hoki Kim Jinseong Park Yujin Choi Woojin Lee Jaewook Lee 15 9 0 27 Jan 2023
Training trajectories, mini-batch losses and the curious role of the learning rate Mark Sandler A. Zhmoginov Max Vladymyrov Nolan Miller ODL 13 10 0 05 Jan 2023
The Vanishing Decision Boundary Complexity and the Strong First Component Hengshuai Yao UQCV 28 0 0 25 Nov 2022
Symmetries, flat minima, and the conserved quantities of gradient flow Bo-Lu Zhao I. Ganev Robin G. Walters Rose Yu Nima Dehmamy 47 16 0 31 Oct 2022
Label driven Knowledge Distillation for Federated Learning with non-IID Data Minh-Duong Nguyen Viet Quoc Pham D. Hoang Long Tran-Thanh Diep N. Nguyen W. Hwang 16 2 0 29 Sep 2022
Generalisation under gradient descent via deterministic PAC-Bayes Eugenio Clerico Tyler Farghly George Deligiannidis Benjamin Guedj Arnaud Doucet 26 4 0 06 Sep 2022
A Closer Look at Smoothness in Domain Adversarial Training Harsh Rangwani Sumukh K Aithal Mayank Mishra Arihant Jain R. Venkatesh Babu 27 119 0 16 Jun 2022
Understanding the Generalization Benefit of Normalization Layers: Sharpness Reduction Kaifeng Lyu Zhiyuan Li Sanjeev Arora FAtt 37 69 0 14 Jun 2022
Towards Understanding Sharpness-Aware Minimization Maksym Andriushchenko Nicolas Flammarion AAML 24 133 0 13 Jun 2022
Embedding Principle in Depth for the Loss Landscape Analysis of Deep Neural Networks Zhiwei Bai Tao Luo Z. Xu Yaoyu Zhang 23 4 0 26 May 2022
Closing the Generalization Gap of Cross-silo Federated Medical Image Segmentation An Xu Wenqi Li Pengfei Guo Dong Yang H. Roth Ali Hatamizadeh Can Zhao Daguang Xu Heng-Chiao Huang Ziyue Xu FedML 28 51 0 18 Mar 2022
When Do Flat Minima Optimizers Work? Jean Kaddour Linqing Liu Ricardo M. A. Silva Matt J. Kusner ODL 11 58 0 01 Feb 2022
Embedding Principle: a hierarchical structure of loss landscape of deep neural networks Yaoyu Zhang Yuqing Li Zhongwang Zhang Tao Luo Z. Xu 21 21 0 30 Nov 2021
Exponential escape efficiency of SGD from sharp minima in non-stationary regime Hikaru Ibayashi Masaaki Imaizumi 26 4 0 07 Nov 2021
Shift-Curvature, SGD, and Generalization Arwen V. Bradley C. Gomez-Uribe Manish Reddy Vuyyuru 27 2 0 21 Aug 2021
SelfReg: Self-supervised Contrastive Regularization for Domain Generalization Daehee Kim Seunghyun Park Jinkyu Kim Jaekoo Lee OOD SSL 62 264 0 20 Apr 2021
A Neural Pre-Conditioning Active Learning Algorithm to Reduce Label Complexity Seo Taek Kong Soomin Jeon Dongbin Na Jaewon Lee Honglak Lee Kyu-Hwan Jung 15 6 0 08 Apr 2021
Formal Language Theory Meets Modern NLP William Merrill AI4CE NAI 14 12 0 19 Feb 2021
A Random Matrix Theory Approach to Damping in Deep Learning Diego Granziol Nicholas P. Baskerville AI4CE ODL 24 2 0 15 Nov 2020
Improving Neural Network Training in Low Dimensional Random Bases Frithjof Gressmann Zach Eaton-Rosen Carlo Luschi 22 28 0 09 Nov 2020
Directional Pruning of Deep Neural Networks Shih-Kang Chao Zhanyu Wang Yue Xing Guang Cheng ODL 8 33 0 16 Jun 2020
A Diffusion Theory For Deep Learning Dynamics: Stochastic Gradient Descent Exponentially Favors Flat Minima Zeke Xie Issei Sato Masashi Sugiyama ODL 20 17 0 10 Feb 2020
There Are Many Consistent Explanations of Unlabeled Data: Why You Should Average Ben Athiwaratkun Marc Finzi Pavel Izmailov A. Wilson 199 243 0 14 Jun 2018
On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima N. Keskar Dheevatsa Mudigere J. Nocedal M. Smelyanskiy P. T. P. Tang ODL 281 2,888 0 15 Sep 2016