Rotational Equilibrium: How Weight Decay Balances Learning Across Neural Networks

26 May 2023

Papers citing "Rotational Equilibrium: How Weight Decay Balances Learning Across Neural Networks"

13 / 13 papers shown

Title
Grokking at the Edge of Numerical Stability Lucas Prieto Melih Barsbey Pedro A.M. Mediano Tolga Birdal 34 3 0 08 Jan 2025
Analyzing & Reducing the Need for Learning Rate Warmup in GPT Training Atli Kosson Bettina Messmer Martin Jaggi AI4CE 18 2 0 31 Oct 2024
How Does Critical Batch Size Scale in Pre-training? Hanlin Zhang Depen Morwani Nikhil Vyas Jingfeng Wu Difan Zou Udaya Ghai Dean Phillips Foster Sham Kakade 75 8 0 29 Oct 2024
Pyramid Vector Quantization for LLMs Tycho F. A. van der Ouderaa Maximilian L. Croci Agrin Hilmkil James Hensman MQ 29 1 0 22 Oct 2024
How to set AdamW's weight decay as you scale model and dataset size Xi Wang Laurence Aitchison 38 9 0 22 May 2024
Learning in PINNs: Phase transition, total diffusion, and generalization Sokratis J. Anagnostopoulos Juan Diego Toscano Nikolaos Stergiopulos George Karniadakis 24 10 0 27 Mar 2024
Analyzing and Improving the Training Dynamics of Diffusion Models Tero Karras M. Aittala J. Lehtinen Janne Hellsten Timo Aila S. Laine 28 153 0 05 Dec 2023
Why Do We Need Weight Decay in Modern Deep Learning? Maksym Andriushchenko Francesco DÁngelo Aditya Varre Nicolas Flammarion 26 27 0 06 Oct 2023
Training Scale-Invariant Neural Networks on the Sphere Can Happen in Three Regimes M. Kodryan E. Lobacheva M. Nakhodnov Dmitry Vetrov 39 15 0 08 Sep 2022
On the SDEs and Scaling Rules for Adaptive Gradient Algorithms Sadhika Malladi Kaifeng Lyu A. Panigrahi Sanjeev Arora 92 40 0 20 May 2022
Learning by Turning: Neural Architecture Aware Optimisation Yang Liu Jeremy Bernstein M. Meister Yisong Yue ODL 39 26 0 14 Feb 2021
High-Performance Large-Scale Image Recognition Without Normalization Andrew Brock Soham De Samuel L. Smith Karen Simonyan VLM 223 512 0 11 Feb 2021
ImageNet Large Scale Visual Recognition Challenge Olga Russakovsky Jia Deng Hao Su J. Krause S. Satheesh ... A. Karpathy A. Khosla Michael S. Bernstein Alexander C. Berg Li Fei-Fei VLM ObjD 296 39,194 0 01 Sep 2014