Scalable Optimization in the Modular Norm

Scalable Optimization in the Modular Norm

23 May 2024

Yang Liu

Jeremy Bernstein

Papers citing "Scalable Optimization in the Modular Norm"

10 / 10 papers shown

Title
Don't be lazy: CompleteP enables compute-efficient deep transformers Nolan Dey Bin Claire Zhang Lorenzo Noci Mufan Bill Li Blake Bordelon Shane Bergsma C. Pehlevan Boris Hanin Joel Hestness 39 0 0 02 May 2025
Function-Space Learning Rates Edward Milsom Ben Anson Laurence Aitchison 47 1 0 24 Feb 2025
Physics of Skill Learning Ziming Liu Yizhou Liu Eric J. Michaud Jeff Gore Max Tegmark 44 0 0 21 Jan 2025
FOCUS: First Order Concentrated Updating Scheme Yizhou Liu Ziming Liu Jeff Gore ODL 104 0 0 21 Jan 2025
Time Transfer: On Optimal Learning Rate and Batch Size In The Infinite Data Limit Oleg Filatov Jan Ebert Jiangtao Wang Stefan Kesselheim 36 3 0 10 Jan 2025
Modular Duality in Deep Learning Jeremy Bernstein Laker Newhouse 22 2 0 28 Oct 2024
Old Optimizer, New Norm: An Anthology Jeremy Bernstein Laker Newhouse ODL 36 12 0 30 Sep 2024
$u-$\mu$P: The Unit-Scaled Maximal Update Parametrization$ u- $\mu$ P: The Unit-Scaled Maximal Update Parametrization Charlie Blake C. Eichenberg Josef Dean Lukas Balles Luke Y. Prince Bjorn Deiseroth Andres Felipe Cruz Salinas Carlo Luschi Samuel Weinbach Douglas Orr 51 9 0 24 Jul 2024
The large learning rate phase of deep learning: the catapult mechanism Aitor Lewkowycz Yasaman Bahri Ethan Dyer Jascha Narain Sohl-Dickstein Guy Gur-Ari ODL 153 232 0 04 Mar 2020
On the distance between two neural networks and the stability of learning Jeremy Bernstein Arash Vahdat Yisong Yue Ming-Yu Liu ODL 190 57 0 09 Feb 2020