ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2405.14813
  4. Cited By
Scalable Optimization in the Modular Norm

Scalable Optimization in the Modular Norm

Neural Information Processing Systems (NeurIPS), 2024
23 May 2024
Tim Large
Yang Liu
Minyoung Huh
Hyojin Bahng
Phillip Isola
Jeremy Bernstein
ArXiv (abs)PDFHTMLGithub (194★)

Papers citing "Scalable Optimization in the Modular Norm"

19 / 19 papers shown
Turbo-Muon: Accelerating Orthogonality-Based Optimization with Pre-Conditioning
Turbo-Muon: Accelerating Orthogonality-Based Optimization with Pre-Conditioning
Thibaut Boissin
Thomas Massena
Franck Mamalet
M. Serrurier
157
1
0
04 Dec 2025
FedMuon: Accelerating Federated Learning with Matrix Orthogonalization
FedMuon: Accelerating Federated Learning with Matrix Orthogonalization
Junkang Liu
Fanhua Shang
Junchao Zhou
Hongying Liu
Yuanyuan Liu
Jin Liu
FedML
277
7
0
31 Oct 2025
How Muon's Spectral Design Benefits Generalization: A Study on Imbalanced Data
How Muon's Spectral Design Benefits Generalization: A Study on Imbalanced Data
Bhavya Vasudeva
Puneesh Deora
Yize Zhao
Vatsal Sharan
Christos Thrampoulidis
343
3
0
27 Oct 2025
Noise-Adaptive Layerwise Learning Rates: Accelerating Geometry-Aware Optimization for Deep Neural Network Training
Noise-Adaptive Layerwise Learning Rates: Accelerating Geometry-Aware Optimization for Deep Neural Network Training
Jie Hao
Xiaochuan Gong
Jie Xu
Z. Wang
Mingrui Liu
AI4CE
203
1
0
15 Oct 2025
An Exploration of Non-Euclidean Gradient Descent: Muon and its Many Variants
An Exploration of Non-Euclidean Gradient Descent: Muon and its Many Variants
M. Crawshaw
Chirag Modi
Mingrui Liu
Robert Mansel Gower
180
8
0
10 Oct 2025
Optimal Scaling Needs Optimal Norm
Optimal Scaling Needs Optimal Norm
Oleg Filatov
Jiangtao Wang
J. Ebert
Stefan Kesselheim
241
3
0
04 Oct 2025
Beyond Outliers: A Study of Optimizers Under Quantization
Beyond Outliers: A Study of Optimizers Under Quantization
Georgios Vlassis
Saleh Ashkboos
Alexandra Volkova
Torsten Hoefler
Dan Alistarh
MQ
311
4
0
27 Sep 2025
A Stable Whitening Optimizer for Efficient Neural Network Training
A Stable Whitening Optimizer for Efficient Neural Network Training
Kevin Frans
Sergey Levine
Pieter Abbeel
513
8
0
08 Jun 2025
Generalized Gradient Norm Clipping & Non-Euclidean $(L_0,L_1)$-Smoothness
Generalized Gradient Norm Clipping & Non-Euclidean (L0,L1)(L_0,L_1)(L0​,L1​)-Smoothness
Thomas Pethick
Wanyun Xie
Mete Erdogan
Kimon Antonakopoulos
Tony Silveti-Falls
Volkan Cevher
387
9
0
02 Jun 2025
Gluon: Making Muon & Scion Great Again! (Bridging Theory and Practice of LMO-based Optimizers for LLMs)
Gluon: Making Muon & Scion Great Again! (Bridging Theory and Practice of LMO-based Optimizers for LLMs)
Artem Riabinin
Egor Shulgin
Kaja Gruntkowska
Peter Richtárik
AI4CE
534
35
0
19 May 2025
Don't be lazy: CompleteP enables compute-efficient deep transformers
Don't be lazy: CompleteP enables compute-efficient deep transformers
Nolan Dey
Bin Claire Zhang
Lorenzo Noci
Mufan Li
Blake Bordelon
Shane Bergsma
Cengiz Pehlevan
Boris Hanin
Joel Hestness
705
38
0
02 May 2025
ASGO: Adaptive Structured Gradient Optimization
ASGO: Adaptive Structured Gradient Optimization
Kang An
Yuxing Liu
Boyao Wang
Shiqian Ma
Shiqian Ma
Tong Zhang
Tong Zhang
ODL
551
38
0
26 Mar 2025
Function-Space Learning Rates
Function-Space Learning Rates
Edward Milsom
Ben Anson
Laurence Aitchison
536
3
0
24 Feb 2025
Physics of Skill Learning
Physics of Skill Learning
Ziming Liu
Yizhou Liu
Eric J. Michaud
Jeff Gore
Max Tegmark
409
4
0
21 Jan 2025
FOCUS: First Order Concentrated Updating Scheme
FOCUS: First Order Concentrated Updating Scheme
Yizhou Liu
Ziming Liu
Jeff Gore
ODL
427
4
0
21 Jan 2025
Time Transfer: On Optimal Learning Rate and Batch Size In The Infinite Data Limit
Time Transfer: On Optimal Learning Rate and Batch Size In The Infinite Data Limit
Oleg Filatov
Jan Ebert
Jiangtao Wang
Stefan Kesselheim
423
8
0
10 Jan 2025
Modular Duality in Deep Learning
Modular Duality in Deep Learning
Jeremy Bernstein
Laker Newhouse
212
40
0
28 Oct 2024
Old Optimizer, New Norm: An Anthology
Old Optimizer, New Norm: An Anthology
Jeremy Bernstein
Laker Newhouse
ODL
391
96
0
30 Sep 2024
u-$\mu$P: The Unit-Scaled Maximal Update Parametrization
u-μ\muμP: The Unit-Scaled Maximal Update Parametrization
Charlie Blake
C. Eichenberg
Josef Dean
Lukas Balles
Luke Y. Prince
Bjorn Deiseroth
Andres Felipe Cruz Salinas
Carlo Luschi
Samuel Weinbach
Douglas Orr
407
20
0
24 Jul 2024
1
Page 1 of 1