Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2405.14813
Cited By
Scalable Optimization in the Modular Norm
23 May 2024
Tim Large
Yang Liu
Minyoung Huh
Hyojin Bahng
Phillip Isola
Jeremy Bernstein
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Scalable Optimization in the Modular Norm"
10 / 10 papers shown
Title
Don't be lazy: CompleteP enables compute-efficient deep transformers
Nolan Dey
Bin Claire Zhang
Lorenzo Noci
Mufan Bill Li
Blake Bordelon
Shane Bergsma
C. Pehlevan
Boris Hanin
Joel Hestness
39
0
0
02 May 2025
Function-Space Learning Rates
Edward Milsom
Ben Anson
Laurence Aitchison
47
1
0
24 Feb 2025
Physics of Skill Learning
Ziming Liu
Yizhou Liu
Eric J. Michaud
Jeff Gore
Max Tegmark
44
0
0
21 Jan 2025
FOCUS: First Order Concentrated Updating Scheme
Yizhou Liu
Ziming Liu
Jeff Gore
ODL
104
0
0
21 Jan 2025
Time Transfer: On Optimal Learning Rate and Batch Size In The Infinite Data Limit
Oleg Filatov
Jan Ebert
Jiangtao Wang
Stefan Kesselheim
36
3
0
10 Jan 2025
Modular Duality in Deep Learning
Jeremy Bernstein
Laker Newhouse
22
2
0
28 Oct 2024
Old Optimizer, New Norm: An Anthology
Jeremy Bernstein
Laker Newhouse
ODL
36
12
0
30 Sep 2024
u-
μ
\mu
μ
P: The Unit-Scaled Maximal Update Parametrization
Charlie Blake
C. Eichenberg
Josef Dean
Lukas Balles
Luke Y. Prince
Bjorn Deiseroth
Andres Felipe Cruz Salinas
Carlo Luschi
Samuel Weinbach
Douglas Orr
51
9
0
24 Jul 2024
The large learning rate phase of deep learning: the catapult mechanism
Aitor Lewkowycz
Yasaman Bahri
Ethan Dyer
Jascha Narain Sohl-Dickstein
Guy Gur-Ari
ODL
153
232
0
04 Mar 2020
On the distance between two neural networks and the stability of learning
Jeremy Bernstein
Arash Vahdat
Yisong Yue
Ming-Yu Liu
ODL
190
57
0
09 Feb 2020
1