AuON: A Linear-time Alternative to Semi-Orthogonal Momentum Updates
Main:23 Pages
14 Figures
Bibliography:2 Pages
10 Tables
Appendix:5 Pages
Abstract
Orthogonal gradient updates have emerged as a promising direction in optimization for machine learning. However, traditional approaches such as SVD/QR decomposition incur prohibitive computational costs of O(n^3) and underperform compared to well-tuned SGD with momentum, since momentum is applied only after strict orthogonalization. Recent advances, such as Muon, improve efficiency by applying momentum before orthogonalization and producing semi-orthogonal matrices via Newton-Schulz iterations, reducing complexity to O(n^2). Nevertheless, quadratic costs remain a bottleneck.
View on arXivComments on this paper
