Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2411.00999
Cited By
Normalization Layer Per-Example Gradients are Sufficient to Predict Gradient Noise Scale in Transformers
1 November 2024
Gavia Gray
Aman Tiwari
Shane Bergsma
Joel Hestness
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Normalization Layer Per-Example Gradients are Sufficient to Predict Gradient Noise Scale in Transformers"
Title
No papers