ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2006.15815
  4. Cited By
Adaptive Inertia: Disentangling the Effects of Adaptive Learning Rate
  and Momentum

Adaptive Inertia: Disentangling the Effects of Adaptive Learning Rate and Momentum

29 June 2020
Zeke Xie
Xinrui Wang
Huishuai Zhang
Issei Sato
Masashi Sugiyama
    ODL
ArXivPDFHTML

Papers citing "Adaptive Inertia: Disentangling the Effects of Adaptive Learning Rate and Momentum"

9 / 9 papers shown
Title
Do we really have to filter out random noise in pre-training data for language models?
Do we really have to filter out random noise in pre-training data for language models?
Jinghan Ru
Yuxin Xie
Xianwei Zhuang
Yuguo Yin
Zhihui Guo
Zhiming Liu
Qianli Ren
Yuexian Zou
83
2
0
10 Feb 2025
Decoupled Knowledge with Ensemble Learning for Online Distillation
Decoupled Knowledge with Ensemble Learning for Online Distillation
Baitan Shao
Ying Chen
18
0
0
18 Dec 2023
Two Sides of One Coin: the Limits of Untuned SGD and the Power of
  Adaptive Methods
Two Sides of One Coin: the Limits of Untuned SGD and the Power of Adaptive Methods
Junchi Yang
Xiang Li
Ilyas Fatkhullin
Niao He
34
15
0
21 May 2023
Stability Analysis of Sharpness-Aware Minimization
Stability Analysis of Sharpness-Aware Minimization
Hoki Kim
Jinseong Park
Yujin Choi
Jaewook Lee
28
12
0
16 Jan 2023
On the Overlooked Structure of Stochastic Gradients
On the Overlooked Structure of Stochastic Gradients
Zeke Xie
Qian-Yuan Tang
Mingming Sun
P. Li
23
6
0
05 Dec 2022
On the optimization and pruning for Bayesian deep learning
On the optimization and pruning for Bayesian deep learning
X. Ke
Yanan Fan
BDL
UQCV
27
1
0
24 Oct 2022
Meta Knowledge Condensation for Federated Learning
Meta Knowledge Condensation for Federated Learning
Ping Liu
Xin Yu
Joey Tianyi Zhou
DD
FedML
25
28
0
29 Sep 2022
Sparse Double Descent: Where Network Pruning Aggravates Overfitting
Sparse Double Descent: Where Network Pruning Aggravates Overfitting
Zhengqi He
Zeke Xie
Quanzhi Zhu
Zengchang Qin
67
27
0
17 Jun 2022
On the Power-Law Hessian Spectrums in Deep Learning
On the Power-Law Hessian Spectrums in Deep Learning
Zeke Xie
Qian-Yuan Tang
Yunfeng Cai
Mingming Sun
P. Li
ODL
42
8
0
31 Jan 2022
1