Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales

Terms and Conditions

Twitter GitHub LinkedIn Bluesky Youtube

© 2026 ResearchTrend.AI, All rights reserved.

Home
Papers
2405.14813
Cited By

Scalable Optimization in the Modular Norm

Scalable Optimization in the Modular Norm

Neural Information Processing Systems (NeurIPS), 2024

23 May 2024

Yang Liu

Jeremy Bernstein

ArXiv (abs)PDF HTML Github (194★)

Papers citing "Scalable Optimization in the Modular Norm"

19 / 19 papers shown

Turbo-Muon: Accelerating Orthogonality-Based Optimization with Pre-Conditioning

Turbo-Muon: Accelerating Orthogonality-Based Optimization with Pre-Conditioning

Thibaut Boissin

157

1

0

04 Dec 2025

FedMuon: Accelerating Federated Learning with Matrix Orthogonalization

FedMuon: Accelerating Federated Learning with Matrix Orthogonalization

277

7

0

31 Oct 2025

How Muon's Spectral Design Benefits Generalization: A Study on Imbalanced Data

How Muon's Spectral Design Benefits Generalization: A Study on Imbalanced Data

Bhavya Vasudeva

Christos Thrampoulidis

343

3

0

27 Oct 2025

Noise-Adaptive Layerwise Learning Rates: Accelerating Geometry-Aware Optimization for Deep Neural Network Training

Noise-Adaptive Layerwise Learning Rates: Accelerating Geometry-Aware Optimization for Deep Neural Network Training

203

1

0

15 Oct 2025

An Exploration of Non-Euclidean Gradient Descent: Muon and its Many Variants

An Exploration of Non-Euclidean Gradient Descent: Muon and its Many Variants

Robert Mansel Gower

180

8

0

10 Oct 2025

Optimal Scaling Needs Optimal Norm

Optimal Scaling Needs Optimal Norm

Stefan Kesselheim

241

3

0

04 Oct 2025

Beyond Outliers: A Study of Optimizers Under Quantization

Beyond Outliers: A Study of Optimizers Under Quantization

Georgios Vlassis

Alexandra Volkova

Torsten Hoefler

311

4

0

27 Sep 2025

A Stable Whitening Optimizer for Efficient Neural Network Training

A Stable Whitening Optimizer for Efficient Neural Network Training

513

8

0

08 Jun 2025

Generalized Gradient Norm Clipping & Non-Euclidean $(L_0,L_1)$-Smoothness

Generalized Gradient Norm Clipping & Non-Euclidean

(L_0,L_1)

Kimon Antonakopoulos

Tony Silveti-Falls

387

9

0

02 Jun 2025

Gluon: Making Muon & Scion Great Again! (Bridging Theory and Practice of LMO-based Optimizers for LLMs)

Gluon: Making Muon & Scion Great Again! (Bridging Theory and Practice of LMO-based Optimizers for LLMs)

Kaja Gruntkowska

Peter Richtárik

534

35

0

19 May 2025

Don't be lazy: CompleteP enables compute-efficient deep transformers

Don't be lazy: CompleteP enables compute-efficient deep transformers

Bin Claire Zhang

Cengiz Pehlevan

705

38

0

02 May 2025

ASGO: Adaptive Structured Gradient Optimization

ASGO: Adaptive Structured Gradient Optimization

551

38

0

26 Mar 2025

Function-Space Learning Rates

Function-Space Learning Rates

Laurence Aitchison

536

3

0

24 Feb 2025

Physics of Skill Learning

Physics of Skill Learning

Eric J. Michaud

409

4

0

21 Jan 2025

FOCUS: First Order Concentrated Updating Scheme

FOCUS: First Order Concentrated Updating Scheme

427

4

0

21 Jan 2025

Time Transfer: On Optimal Learning Rate and Batch Size In The Infinite Data Limit

Time Transfer: On Optimal Learning Rate and Batch Size In The Infinite Data Limit

Stefan Kesselheim

423

8

0

10 Jan 2025

Modular Duality in Deep Learning

Modular Duality in Deep Learning

Jeremy Bernstein

212

40

0

28 Oct 2024

Old Optimizer, New Norm: An Anthology

Old Optimizer, New Norm: An Anthology

Jeremy Bernstein

391

96

0

30 Sep 2024

$u-$\mu$P: The Unit-Scaled Maximal Update Parametrization$

\mu

P: The Unit-Scaled Maximal Update Parametrization

Bjorn Deiseroth

Andres Felipe Cruz Salinas

Carlo Luschi

Samuel Weinbach

407

20

0

24 Jul 2024

Page 1 of 1