ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2011.11152
  4. Cited By
On the Overlooked Pitfalls of Weight Decay and How to Mitigate Them: A
  Gradient-Norm Perspective
v1v2v3v4v5 (latest)

On the Overlooked Pitfalls of Weight Decay and How to Mitigate Them: A Gradient-Norm Perspective

Neural Information Processing Systems (NeurIPS), 2020
23 November 2020
Zeke Xie
Zhiqiang Xu
Jingzhao Zhang
Issei Sato
Masashi Sugiyama
ArXiv (abs)PDFHTML

Papers citing "On the Overlooked Pitfalls of Weight Decay and How to Mitigate Them: A Gradient-Norm Perspective"

19 / 19 papers shown
Robust Layerwise Scaling Rules by Proper Weight Decay Tuning
Robust Layerwise Scaling Rules by Proper Weight Decay Tuning
Zhiyuan Fan
Yifeng Liu
Qingyue Zhao
Angela Yuan
Quanquan Gu
141
3
0
17 Oct 2025
Cautious Weight Decay
Cautious Weight Decay
Lizhang Chen
Jonathan Li
Kaizhao Liang
Baiyu Su
Cong Xie
Nuo Wang Pierse
Chen Liang
Ni Lao
Qiang Liu
181
6
0
14 Oct 2025
Self Identity Mapping
Self Identity MappingNeural Networks (NN), 2025
Xiuding Cai
Yaoyao Zhu
Linjie Fu
Dong Miao
Yu Yao
264
0
0
17 Sep 2025
AlphaDecay: Module-wise Weight Decay for Heavy-Tailed Balancing in LLMs
AlphaDecay: Module-wise Weight Decay for Heavy-Tailed Balancing in LLMs
Di He
Ajay Jaiswal
Songjun Tu
Li Shen
Ganzhao Yuan
Shiwei Liu
L. Yin
434
3
0
17 Jun 2025
Generalized Gradient Norm Clipping & Non-Euclidean $(L_0,L_1)$-Smoothness
Generalized Gradient Norm Clipping & Non-Euclidean (L0,L1)(L_0,L_1)(L0​,L1​)-Smoothness
Thomas Pethick
Wanyun Xie
Mete Erdogan
Kimon Antonakopoulos
Tony Silveti-Falls
Volkan Cevher
384
9
0
02 Jun 2025
Why Gradients Rapidly Increase Near the End of Training
Why Gradients Rapidly Increase Near the End of Training
Aaron Defazio
185
9
0
02 Jun 2025
NeuralGrok: Accelerate Grokking by Neural Gradient Transformation
NeuralGrok: Accelerate Grokking by Neural Gradient Transformation
Xinyu Zhou
Simin Fan
Martin Jaggi
Jie Fu
288
2
0
24 Apr 2025
Mirror, Mirror of the Flow: How Does Regularization Shape Implicit Bias?
Mirror, Mirror of the Flow: How Does Regularization Shape Implicit Bias?
Tom Jacobs
Chao Zhou
R. Burkholz
OffRLAI4CE
425
5
0
17 Apr 2025
Do we really have to filter out random noise in pre-training data for language models?
Do we really have to filter out random noise in pre-training data for language models?
Jinghan Ru
Yuxin Xie
Xianwei Zhuang
Yuguo Yin
Zhihui Guo
Zhiming Liu
Qianli Ren
Yuexian Zou
529
11
0
10 Feb 2025
Weight decay induces low-rank attention layers
Weight decay induces low-rank attention layersNeural Information Processing Systems (NeurIPS), 2024
Seijin Kobayashi
Yassir Akram
J. Oswald
304
31
0
31 Oct 2024
DARE the Extreme: Revisiting Delta-Parameter Pruning For Fine-Tuned Models
DARE the Extreme: Revisiting Delta-Parameter Pruning For Fine-Tuned ModelsInternational Conference on Learning Representations (ICLR), 2024
Wenlong Deng
Yize Zhao
V. Vakilian
Minghui Chen
Xiaoxiao Li
Christos Thrampoulidis
514
10
0
12 Oct 2024
mGTE: Generalized Long-Context Text Representation and Reranking Models
  for Multilingual Text Retrieval
mGTE: Generalized Long-Context Text Representation and Reranking Models for Multilingual Text RetrievalConference on Empirical Methods in Natural Language Processing (EMNLP), 2024
Xin Zhang
Yanzhao Zhang
Dingkun Long
Wen Xie
Ziqi Dai
...
Pengjun Xie
Fei Huang
Meishan Zhang
Wenjie Li
Min Zhang
357
277
0
29 Jul 2024
Neural Field Classifiers via Target Encoding and Classification Loss
Neural Field Classifiers via Target Encoding and Classification Loss
Xindi Yang
Zeke Xie
Xiong Zhou
Boyu Liu
Buhua Liu
Yi Liu
Haoran Wang
Yunfeng Cai
Mingming Sun
222
0
0
02 Mar 2024
Neural Networks with (Low-Precision) Polynomial Approximations: New
  Insights and Techniques for Accuracy Improvement
Neural Networks with (Low-Precision) Polynomial Approximations: New Insights and Techniques for Accuracy Improvement
Chi Zhang
Jingjing Fan
Man Ho Au
Siu-Ming Yiu
235
1
0
17 Feb 2024
Rotational Equilibrium: How Weight Decay Balances Learning Across Neural
  Networks
Rotational Equilibrium: How Weight Decay Balances Learning Across Neural NetworksInternational Conference on Machine Learning (ICML), 2023
Atli Kosson
Bettina Messmer
Martin Jaggi
551
36
0
26 May 2023
On the Overlooked Structure of Stochastic Gradients
On the Overlooked Structure of Stochastic GradientsNeural Information Processing Systems (NeurIPS), 2022
Zeke Xie
Qian-Yuan Tang
Mingming Sun
P. Li
339
14
0
05 Dec 2022
On effects of Knowledge Distillation on Transfer Learning
On effects of Knowledge Distillation on Transfer Learning
Sushil Thapa
164
3
0
18 Oct 2022
Residual-Concatenate Neural Network with Deep Regularization Layers for
  Binary Classification
Residual-Concatenate Neural Network with Deep Regularization Layers for Binary ClassificationInternational Conference Intelligent Computing and Control Systems (ICICCS), 2022
Abhishek Gupta
Sruthi Nair
Raunak Joshi
V. Chitre
169
6
0
25 May 2022
Stochastic Training is Not Necessary for Generalization
Stochastic Training is Not Necessary for Generalization
Jonas Geiping
Micah Goldblum
Phillip E. Pope
Michael Moeller
Tom Goldstein
529
81
0
29 Sep 2021
1
Page 1 of 1