On the Overlooked Pitfalls of Weight Decay and How to Mitigate Them: A
Gradient-Norm Perspective

On the Overlooked Pitfalls of Weight Decay and How to Mitigate Them: A Gradient-Norm Perspective

23 November 2020

Zeke Xie

Zhiqiang Xu

Masashi Sugiyama

Papers citing "On the Overlooked Pitfalls of Weight Decay and How to Mitigate Them: A Gradient-Norm Perspective"

6 / 6 papers shown

Title
NeuralGrok: Accelerate Grokking by Neural Gradient Transformation Xinyu Zhou Simin Fan Martin Jaggi Jie Fu 23 0 0 24 Apr 2025
Do we really have to filter out random noise in pre-training data for language models? Jinghan Ru Yuxin Xie Xianwei Zhuang Yuguo Yin Zhihui Guo Zhiming Liu Qianli Ren Yuexian Zou 83 2 0 10 Feb 2025
DARE the Extreme: Revisiting Delta-Parameter Pruning For Fine-Tuned Models Wenlong Deng Yize Zhao V. Vakilian Minghui Chen Xiaoxiao Li Christos Thrampoulidis 39 3 0 12 Oct 2024
On the Overlooked Structure of Stochastic Gradients Zeke Xie Qian-Yuan Tang Mingming Sun P. Li 25 6 0 05 Dec 2022
Residual-Concatenate Neural Network with Deep Regularization Layers for Binary Classification Abhishek Gupta Sruthi Nair Raunak Joshi V. Chitre 21 5 0 25 May 2022
Stochastic Training is Not Necessary for Generalization Jonas Geiping Micah Goldblum Phillip E. Pope Michael Moeller Tom Goldstein 86 72 0 29 Sep 2021