Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2303.00565
Cited By
AdaSAM: Boosting Sharpness-Aware Minimization with Adaptive Learning Rate and Momentum for Training Deep Neural Networks
1 March 2023
Hao Sun
Li Shen
Qihuang Zhong
Liang Ding
Shi-Yong Chen
Jingwei Sun
Jing Li
Guangzhong Sun
Dacheng Tao
Re-assign community
ArXiv
PDF
HTML
Papers citing
"AdaSAM: Boosting Sharpness-Aware Minimization with Adaptive Learning Rate and Momentum for Training Deep Neural Networks"
10 / 10 papers shown
Title
Diversifying the Mixture-of-Experts Representation for Language Models with Orthogonal Optimizer
Boan Liu
Liang Ding
Li Shen
Keqin Peng
Yu Cao
Dazhao Cheng
Dacheng Tao
MoE
13
7
0
15 Oct 2023
Practical Sharpness-Aware Minimization Cannot Converge All the Way to Optima
Dongkuk Si
Chulhee Yun
18
15
0
16 Jun 2023
Revisiting Token Dropping Strategy in Efficient BERT Pretraining
Qihuang Zhong
Liang Ding
Juhua Liu
Xuebo Liu
Min Zhang
Bo Du
Dacheng Tao
VLM
14
9
0
24 May 2023
Robust Generalization against Photon-Limited Corruptions via Worst-Case Sharpness Minimization
Zhuo Huang
Miaoxi Zhu
Xiaobo Xia
Li Shen
Jun Yu
Chen Gong
Bo Han
Bo Du
Tongliang Liu
30
30
0
23 Mar 2023
FedSpeed: Larger Local Interval, Less Communication Round, and Higher Generalization Accuracy
Yan Sun
Li Shen
Tiansheng Huang
Liang Ding
Dacheng Tao
FedML
29
50
0
21 Feb 2023
Efficient-Adam: Communication-Efficient Distributed Adam
Congliang Chen
Li Shen
Wei Liu
Z. Luo
8
19
0
28 May 2022
Efficient Sharpness-aware Minimization for Improved Training of Neural Networks
Jiawei Du
Hanshu Yan
Jiashi Feng
Joey Tianyi Zhou
Liangli Zhen
Rick Siow Mong Goh
Vincent Y. F. Tan
AAML
99
132
0
07 Oct 2021
Towards Practical Adam: Non-Convexity, Convergence Theory, and Mini-Batch Acceleration
Congliang Chen
Li Shen
Fangyu Zou
Wei Liu
28
19
0
14 Jan 2021
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding
Alex Jinpeng Wang
Amanpreet Singh
Julian Michael
Felix Hill
Omer Levy
Samuel R. Bowman
ELM
294
6,927
0
20 Apr 2018
On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima
N. Keskar
Dheevatsa Mudigere
J. Nocedal
M. Smelyanskiy
P. T. P. Tang
ODL
273
2,696
0
15 Sep 2016
1