Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2405.13698
Cited By
How to set AdamW's weight decay as you scale model and dataset size
22 May 2024
Xi Wang
Laurence Aitchison
Re-assign community
ArXiv
PDF
HTML
Papers citing
"How to set AdamW's weight decay as you scale model and dataset size"
4 / 4 papers shown
Title
Don't be lazy: CompleteP enables compute-efficient deep transformers
Nolan Dey
Bin Claire Zhang
Lorenzo Noci
Mufan Bill Li
Blake Bordelon
Shane Bergsma
C. Pehlevan
Boris Hanin
Joel Hestness
37
0
0
02 May 2025
FOCUS: First Order Concentrated Updating Scheme
Yizhou Liu
Ziming Liu
Jeff Gore
ODL
104
0
0
21 Jan 2025
Scaling Optimal LR Across Token Horizons
Johan Bjorck
Alon Benhaim
Vishrav Chaudhary
Furu Wei
Xia Song
41
4
0
30 Sep 2024
u-
μ
\mu
μ
P: The Unit-Scaled Maximal Update Parametrization
Charlie Blake
C. Eichenberg
Josef Dean
Lukas Balles
Luke Y. Prince
Bjorn Deiseroth
Andres Felipe Cruz Salinas
Carlo Luschi
Samuel Weinbach
Douglas Orr
46
9
0
24 Jul 2024
1