Meta-Principled Family of Hyperparameter Scaling Strategies

10 October 2022

Papers citing "Meta-Principled Family of Hyperparameter Scaling Strategies"

9 / 9 papers shown

Title
Don't be lazy: CompleteP enables compute-efficient deep transformers Nolan Dey Bin Claire Zhang Lorenzo Noci Mufan Bill Li Blake Bordelon Shane Bergsma C. Pehlevan Boris Hanin Joel Hestness 37 0 0 02 May 2025
Deep Neural Nets as Hamiltonians Mike Winer Boris Hanin 41 0 0 31 Mar 2025
Universal Sharpness Dynamics in Neural Network Training: Fixed Point Analysis, Edge of Stability, and Route to Chaos Dayal Singh Kalra Tianyu He M. Barkeshli 47 4 0 17 Feb 2025
Small-scale proxies for large-scale Transformer training instabilities Mitchell Wortsman Peter J. Liu Lechao Xiao Katie Everett A. Alemi ... Jascha Narain Sohl-Dickstein Kelvin Xu Jaehoon Lee Justin Gilmer Simon Kornblith 24 80 0 25 Sep 2023
Dynamics of Finite Width Kernel and Prediction Fluctuations in Mean Field Neural Networks Blake Bordelon C. Pehlevan MLT 24 29 0 06 Apr 2023
Effective Theory of Transformers at Initialization Emily Dinan Sho Yaida Susan Zhang 20 14 0 04 Apr 2023
Zero-Shot Text-to-Image Generation Aditya A. Ramesh Mikhail Pavlov Gabriel Goh Scott Gray Chelsea Voss Alec Radford Mark Chen Ilya Sutskever VLM 253 4,735 0 24 Feb 2021
Scaling Laws for Neural Language Models Jared Kaplan Sam McCandlish T. Henighan Tom B. Brown B. Chess R. Child Scott Gray Alec Radford Jeff Wu Dario Amodei 220 4,424 0 23 Jan 2020
Trainability and Accuracy of Neural Networks: An Interacting Particle System Approach Grant M. Rotskoff Eric Vanden-Eijnden 56 114 0 02 May 2018