Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2011.11152
Cited By
On the Overlooked Pitfalls of Weight Decay and How to Mitigate Them: A Gradient-Norm Perspective
23 November 2020
Zeke Xie
Zhiqiang Xu
Jingzhao Zhang
Issei Sato
Masashi Sugiyama
Re-assign community
ArXiv
PDF
HTML
Papers citing
"On the Overlooked Pitfalls of Weight Decay and How to Mitigate Them: A Gradient-Norm Perspective"
6 / 6 papers shown
Title
NeuralGrok: Accelerate Grokking by Neural Gradient Transformation
Xinyu Zhou
Simin Fan
Martin Jaggi
Jie Fu
23
0
0
24 Apr 2025
Do we really have to filter out random noise in pre-training data for language models?
Jinghan Ru
Yuxin Xie
Xianwei Zhuang
Yuguo Yin
Zhihui Guo
Zhiming Liu
Qianli Ren
Yuexian Zou
83
2
0
10 Feb 2025
DARE the Extreme: Revisiting Delta-Parameter Pruning For Fine-Tuned Models
Wenlong Deng
Yize Zhao
V. Vakilian
Minghui Chen
Xiaoxiao Li
Christos Thrampoulidis
39
3
0
12 Oct 2024
On the Overlooked Structure of Stochastic Gradients
Zeke Xie
Qian-Yuan Tang
Mingming Sun
P. Li
25
6
0
05 Dec 2022
Residual-Concatenate Neural Network with Deep Regularization Layers for Binary Classification
Abhishek Gupta
Sruthi Nair
Raunak Joshi
V. Chitre
21
5
0
25 May 2022
Stochastic Training is Not Necessary for Generalization
Jonas Geiping
Micah Goldblum
Phillip E. Pope
Michael Moeller
Tom Goldstein
86
72
0
29 Sep 2021
1