Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1902.00744
Cited By
Asymmetric Valleys: Beyond Sharp and Flat Local Minima
2 February 2019
Haowei He
Gao Huang
Yang Yuan
ODL
MLT
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Asymmetric Valleys: Beyond Sharp and Flat Local Minima"
32 / 32 papers shown
Title
Does SGD really happen in tiny subspaces?
Minhak Song
Kwangjun Ahn
Chulhee Yun
66
4
1
25 May 2024
No Train No Gain: Revisiting Efficient Training Algorithms For Transformer-based Language Models
Jean Kaddour
Oscar Key
Piotr Nawrot
Pasquale Minervini
Matt J. Kusner
20
41
0
12 Jul 2023
SING: A Plug-and-Play DNN Learning Technique
Adrien Courtois
Damien Scieur
Jean-Michel Morel
Pablo Arias
Thomas Eboli
22
0
0
25 May 2023
How to escape sharp minima with random perturbations
Kwangjun Ahn
Ali Jadbabaie
S. Sra
ODL
29
6
0
25 May 2023
GeNAS: Neural Architecture Search with Better Generalization
Joonhyun Jeong
Joonsang Yu
Geondo Park
Dongyoon Han
Y. Yoo
25
4
0
15 May 2023
An Adaptive Policy to Employ Sharpness-Aware Minimization
Weisen Jiang
Hansi Yang
Yu Zhang
James T. Kwok
AAML
81
31
0
28 Apr 2023
Revisiting the Noise Model of Stochastic Gradient Descent
Barak Battash
Ofir Lindenbaum
27
9
0
05 Mar 2023
DiTTO: A Feature Representation Imitation Approach for Improving Cross-Lingual Transfer
Shanu Kumar
Abbaraju Soujanya
Sandipan Dandapat
Sunayana Sitaram
Monojit Choudhury
VLM
25
1
0
04 Mar 2023
Exploring the Effect of Multi-step Ascent in Sharpness-Aware Minimization
Hoki Kim
Jinseong Park
Yujin Choi
Woojin Lee
Jaewook Lee
15
9
0
27 Jan 2023
Training trajectories, mini-batch losses and the curious role of the learning rate
Mark Sandler
A. Zhmoginov
Max Vladymyrov
Nolan Miller
ODL
13
10
0
05 Jan 2023
The Vanishing Decision Boundary Complexity and the Strong First Component
Hengshuai Yao
UQCV
28
0
0
25 Nov 2022
Symmetries, flat minima, and the conserved quantities of gradient flow
Bo-Lu Zhao
I. Ganev
Robin G. Walters
Rose Yu
Nima Dehmamy
47
16
0
31 Oct 2022
Label driven Knowledge Distillation for Federated Learning with non-IID Data
Minh-Duong Nguyen
Viet Quoc Pham
D. Hoang
Long Tran-Thanh
Diep N. Nguyen
W. Hwang
16
2
0
29 Sep 2022
Generalisation under gradient descent via deterministic PAC-Bayes
Eugenio Clerico
Tyler Farghly
George Deligiannidis
Benjamin Guedj
Arnaud Doucet
26
4
0
06 Sep 2022
A Closer Look at Smoothness in Domain Adversarial Training
Harsh Rangwani
Sumukh K Aithal
Mayank Mishra
Arihant Jain
R. Venkatesh Babu
27
119
0
16 Jun 2022
Understanding the Generalization Benefit of Normalization Layers: Sharpness Reduction
Kaifeng Lyu
Zhiyuan Li
Sanjeev Arora
FAtt
37
69
0
14 Jun 2022
Towards Understanding Sharpness-Aware Minimization
Maksym Andriushchenko
Nicolas Flammarion
AAML
24
133
0
13 Jun 2022
Embedding Principle in Depth for the Loss Landscape Analysis of Deep Neural Networks
Zhiwei Bai
Tao Luo
Z. Xu
Yaoyu Zhang
23
4
0
26 May 2022
Closing the Generalization Gap of Cross-silo Federated Medical Image Segmentation
An Xu
Wenqi Li
Pengfei Guo
Dong Yang
H. Roth
Ali Hatamizadeh
Can Zhao
Daguang Xu
Heng-Chiao Huang
Ziyue Xu
FedML
28
51
0
18 Mar 2022
When Do Flat Minima Optimizers Work?
Jean Kaddour
Linqing Liu
Ricardo M. A. Silva
Matt J. Kusner
ODL
11
58
0
01 Feb 2022
Embedding Principle: a hierarchical structure of loss landscape of deep neural networks
Yaoyu Zhang
Yuqing Li
Zhongwang Zhang
Tao Luo
Z. Xu
21
21
0
30 Nov 2021
Exponential escape efficiency of SGD from sharp minima in non-stationary regime
Hikaru Ibayashi
Masaaki Imaizumi
26
4
0
07 Nov 2021
Shift-Curvature, SGD, and Generalization
Arwen V. Bradley
C. Gomez-Uribe
Manish Reddy Vuyyuru
27
2
0
21 Aug 2021
SelfReg: Self-supervised Contrastive Regularization for Domain Generalization
Daehee Kim
Seunghyun Park
Jinkyu Kim
Jaekoo Lee
OOD
SSL
62
264
0
20 Apr 2021
A Neural Pre-Conditioning Active Learning Algorithm to Reduce Label Complexity
Seo Taek Kong
Soomin Jeon
Dongbin Na
Jaewon Lee
Honglak Lee
Kyu-Hwan Jung
15
6
0
08 Apr 2021
Formal Language Theory Meets Modern NLP
William Merrill
AI4CE
NAI
14
12
0
19 Feb 2021
A Random Matrix Theory Approach to Damping in Deep Learning
Diego Granziol
Nicholas P. Baskerville
AI4CE
ODL
24
2
0
15 Nov 2020
Improving Neural Network Training in Low Dimensional Random Bases
Frithjof Gressmann
Zach Eaton-Rosen
Carlo Luschi
22
28
0
09 Nov 2020
Directional Pruning of Deep Neural Networks
Shih-Kang Chao
Zhanyu Wang
Yue Xing
Guang Cheng
ODL
8
33
0
16 Jun 2020
A Diffusion Theory For Deep Learning Dynamics: Stochastic Gradient Descent Exponentially Favors Flat Minima
Zeke Xie
Issei Sato
Masashi Sugiyama
ODL
20
17
0
10 Feb 2020
There Are Many Consistent Explanations of Unlabeled Data: Why You Should Average
Ben Athiwaratkun
Marc Finzi
Pavel Izmailov
A. Wilson
199
243
0
14 Jun 2018
On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima
N. Keskar
Dheevatsa Mudigere
J. Nocedal
M. Smelyanskiy
P. T. P. Tang
ODL
281
2,888
0
15 Sep 2016
1