Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2305.17212
Cited By
Rotational Equilibrium: How Weight Decay Balances Learning Across Neural Networks
26 May 2023
Atli Kosson
Bettina Messmer
Martin Jaggi
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Rotational Equilibrium: How Weight Decay Balances Learning Across Neural Networks"
13 / 13 papers shown
Title
Grokking at the Edge of Numerical Stability
Lucas Prieto
Melih Barsbey
Pedro A.M. Mediano
Tolga Birdal
34
3
0
08 Jan 2025
Analyzing & Reducing the Need for Learning Rate Warmup in GPT Training
Atli Kosson
Bettina Messmer
Martin Jaggi
AI4CE
18
2
0
31 Oct 2024
How Does Critical Batch Size Scale in Pre-training?
Hanlin Zhang
Depen Morwani
Nikhil Vyas
Jingfeng Wu
Difan Zou
Udaya Ghai
Dean Phillips Foster
Sham Kakade
75
8
0
29 Oct 2024
Pyramid Vector Quantization for LLMs
Tycho F. A. van der Ouderaa
Maximilian L. Croci
Agrin Hilmkil
James Hensman
MQ
29
1
0
22 Oct 2024
How to set AdamW's weight decay as you scale model and dataset size
Xi Wang
Laurence Aitchison
38
9
0
22 May 2024
Learning in PINNs: Phase transition, total diffusion, and generalization
Sokratis J. Anagnostopoulos
Juan Diego Toscano
Nikolaos Stergiopulos
George Karniadakis
24
10
0
27 Mar 2024
Analyzing and Improving the Training Dynamics of Diffusion Models
Tero Karras
M. Aittala
J. Lehtinen
Janne Hellsten
Timo Aila
S. Laine
28
153
0
05 Dec 2023
Why Do We Need Weight Decay in Modern Deep Learning?
Maksym Andriushchenko
Francesco DÁngelo
Aditya Varre
Nicolas Flammarion
24
27
0
06 Oct 2023
Training Scale-Invariant Neural Networks on the Sphere Can Happen in Three Regimes
M. Kodryan
E. Lobacheva
M. Nakhodnov
Dmitry Vetrov
36
15
0
08 Sep 2022
On the SDEs and Scaling Rules for Adaptive Gradient Algorithms
Sadhika Malladi
Kaifeng Lyu
A. Panigrahi
Sanjeev Arora
92
40
0
20 May 2022
Learning by Turning: Neural Architecture Aware Optimisation
Yang Liu
Jeremy Bernstein
M. Meister
Yisong Yue
ODL
39
26
0
14 Feb 2021
High-Performance Large-Scale Image Recognition Without Normalization
Andrew Brock
Soham De
Samuel L. Smith
Karen Simonyan
VLM
223
512
0
11 Feb 2021
ImageNet Large Scale Visual Recognition Challenge
Olga Russakovsky
Jia Deng
Hao Su
J. Krause
S. Satheesh
...
A. Karpathy
A. Khosla
Michael S. Bernstein
Alexander C. Berg
Li Fei-Fei
VLM
ObjD
296
39,194
0
01 Sep 2014
1