ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1902.00744
  4. Cited By
Asymmetric Valleys: Beyond Sharp and Flat Local Minima

Asymmetric Valleys: Beyond Sharp and Flat Local Minima

2 February 2019
Haowei He
Gao Huang
Yang Yuan
    ODL
    MLT
ArXivPDFHTML

Papers citing "Asymmetric Valleys: Beyond Sharp and Flat Local Minima"

32 / 32 papers shown
Title
Does SGD really happen in tiny subspaces?
Does SGD really happen in tiny subspaces?
Minhak Song
Kwangjun Ahn
Chulhee Yun
66
4
1
25 May 2024
No Train No Gain: Revisiting Efficient Training Algorithms For
  Transformer-based Language Models
No Train No Gain: Revisiting Efficient Training Algorithms For Transformer-based Language Models
Jean Kaddour
Oscar Key
Piotr Nawrot
Pasquale Minervini
Matt J. Kusner
20
41
0
12 Jul 2023
SING: A Plug-and-Play DNN Learning Technique
SING: A Plug-and-Play DNN Learning Technique
Adrien Courtois
Damien Scieur
Jean-Michel Morel
Pablo Arias
Thomas Eboli
22
0
0
25 May 2023
How to escape sharp minima with random perturbations
How to escape sharp minima with random perturbations
Kwangjun Ahn
Ali Jadbabaie
S. Sra
ODL
29
6
0
25 May 2023
GeNAS: Neural Architecture Search with Better Generalization
GeNAS: Neural Architecture Search with Better Generalization
Joonhyun Jeong
Joonsang Yu
Geondo Park
Dongyoon Han
Y. Yoo
25
4
0
15 May 2023
An Adaptive Policy to Employ Sharpness-Aware Minimization
An Adaptive Policy to Employ Sharpness-Aware Minimization
Weisen Jiang
Hansi Yang
Yu Zhang
James T. Kwok
AAML
81
31
0
28 Apr 2023
Revisiting the Noise Model of Stochastic Gradient Descent
Revisiting the Noise Model of Stochastic Gradient Descent
Barak Battash
Ofir Lindenbaum
27
9
0
05 Mar 2023
DiTTO: A Feature Representation Imitation Approach for Improving
  Cross-Lingual Transfer
DiTTO: A Feature Representation Imitation Approach for Improving Cross-Lingual Transfer
Shanu Kumar
Abbaraju Soujanya
Sandipan Dandapat
Sunayana Sitaram
Monojit Choudhury
VLM
25
1
0
04 Mar 2023
Exploring the Effect of Multi-step Ascent in Sharpness-Aware
  Minimization
Exploring the Effect of Multi-step Ascent in Sharpness-Aware Minimization
Hoki Kim
Jinseong Park
Yujin Choi
Woojin Lee
Jaewook Lee
15
9
0
27 Jan 2023
Training trajectories, mini-batch losses and the curious role of the
  learning rate
Training trajectories, mini-batch losses and the curious role of the learning rate
Mark Sandler
A. Zhmoginov
Max Vladymyrov
Nolan Miller
ODL
13
10
0
05 Jan 2023
The Vanishing Decision Boundary Complexity and the Strong First
  Component
The Vanishing Decision Boundary Complexity and the Strong First Component
Hengshuai Yao
UQCV
28
0
0
25 Nov 2022
Symmetries, flat minima, and the conserved quantities of gradient flow
Symmetries, flat minima, and the conserved quantities of gradient flow
Bo-Lu Zhao
I. Ganev
Robin G. Walters
Rose Yu
Nima Dehmamy
47
16
0
31 Oct 2022
Label driven Knowledge Distillation for Federated Learning with non-IID
  Data
Label driven Knowledge Distillation for Federated Learning with non-IID Data
Minh-Duong Nguyen
Viet Quoc Pham
D. Hoang
Long Tran-Thanh
Diep N. Nguyen
W. Hwang
16
2
0
29 Sep 2022
Generalisation under gradient descent via deterministic PAC-Bayes
Generalisation under gradient descent via deterministic PAC-Bayes
Eugenio Clerico
Tyler Farghly
George Deligiannidis
Benjamin Guedj
Arnaud Doucet
26
4
0
06 Sep 2022
A Closer Look at Smoothness in Domain Adversarial Training
A Closer Look at Smoothness in Domain Adversarial Training
Harsh Rangwani
Sumukh K Aithal
Mayank Mishra
Arihant Jain
R. Venkatesh Babu
27
119
0
16 Jun 2022
Understanding the Generalization Benefit of Normalization Layers:
  Sharpness Reduction
Understanding the Generalization Benefit of Normalization Layers: Sharpness Reduction
Kaifeng Lyu
Zhiyuan Li
Sanjeev Arora
FAtt
37
69
0
14 Jun 2022
Towards Understanding Sharpness-Aware Minimization
Towards Understanding Sharpness-Aware Minimization
Maksym Andriushchenko
Nicolas Flammarion
AAML
24
133
0
13 Jun 2022
Embedding Principle in Depth for the Loss Landscape Analysis of Deep Neural Networks
Embedding Principle in Depth for the Loss Landscape Analysis of Deep Neural Networks
Zhiwei Bai
Tao Luo
Z. Xu
Yaoyu Zhang
23
4
0
26 May 2022
Closing the Generalization Gap of Cross-silo Federated Medical Image
  Segmentation
Closing the Generalization Gap of Cross-silo Federated Medical Image Segmentation
An Xu
Wenqi Li
Pengfei Guo
Dong Yang
H. Roth
Ali Hatamizadeh
Can Zhao
Daguang Xu
Heng-Chiao Huang
Ziyue Xu
FedML
28
51
0
18 Mar 2022
When Do Flat Minima Optimizers Work?
When Do Flat Minima Optimizers Work?
Jean Kaddour
Linqing Liu
Ricardo M. A. Silva
Matt J. Kusner
ODL
11
58
0
01 Feb 2022
Embedding Principle: a hierarchical structure of loss landscape of deep
  neural networks
Embedding Principle: a hierarchical structure of loss landscape of deep neural networks
Yaoyu Zhang
Yuqing Li
Zhongwang Zhang
Tao Luo
Z. Xu
21
21
0
30 Nov 2021
Exponential escape efficiency of SGD from sharp minima in non-stationary
  regime
Exponential escape efficiency of SGD from sharp minima in non-stationary regime
Hikaru Ibayashi
Masaaki Imaizumi
26
4
0
07 Nov 2021
Shift-Curvature, SGD, and Generalization
Shift-Curvature, SGD, and Generalization
Arwen V. Bradley
C. Gomez-Uribe
Manish Reddy Vuyyuru
27
2
0
21 Aug 2021
SelfReg: Self-supervised Contrastive Regularization for Domain
  Generalization
SelfReg: Self-supervised Contrastive Regularization for Domain Generalization
Daehee Kim
Seunghyun Park
Jinkyu Kim
Jaekoo Lee
OOD
SSL
62
264
0
20 Apr 2021
A Neural Pre-Conditioning Active Learning Algorithm to Reduce Label
  Complexity
A Neural Pre-Conditioning Active Learning Algorithm to Reduce Label Complexity
Seo Taek Kong
Soomin Jeon
Dongbin Na
Jaewon Lee
Honglak Lee
Kyu-Hwan Jung
15
6
0
08 Apr 2021
Formal Language Theory Meets Modern NLP
Formal Language Theory Meets Modern NLP
William Merrill
AI4CE
NAI
14
12
0
19 Feb 2021
A Random Matrix Theory Approach to Damping in Deep Learning
A Random Matrix Theory Approach to Damping in Deep Learning
Diego Granziol
Nicholas P. Baskerville
AI4CE
ODL
24
2
0
15 Nov 2020
Improving Neural Network Training in Low Dimensional Random Bases
Improving Neural Network Training in Low Dimensional Random Bases
Frithjof Gressmann
Zach Eaton-Rosen
Carlo Luschi
22
28
0
09 Nov 2020
Directional Pruning of Deep Neural Networks
Directional Pruning of Deep Neural Networks
Shih-Kang Chao
Zhanyu Wang
Yue Xing
Guang Cheng
ODL
8
33
0
16 Jun 2020
A Diffusion Theory For Deep Learning Dynamics: Stochastic Gradient
  Descent Exponentially Favors Flat Minima
A Diffusion Theory For Deep Learning Dynamics: Stochastic Gradient Descent Exponentially Favors Flat Minima
Zeke Xie
Issei Sato
Masashi Sugiyama
ODL
20
17
0
10 Feb 2020
There Are Many Consistent Explanations of Unlabeled Data: Why You Should
  Average
There Are Many Consistent Explanations of Unlabeled Data: Why You Should Average
Ben Athiwaratkun
Marc Finzi
Pavel Izmailov
A. Wilson
199
243
0
14 Jun 2018
On Large-Batch Training for Deep Learning: Generalization Gap and Sharp
  Minima
On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima
N. Keskar
Dheevatsa Mudigere
J. Nocedal
M. Smelyanskiy
P. T. P. Tang
ODL
281
2,888
0
15 Sep 2016
1