A Distributed Data-Parallel PyTorch Implementation of the Distributed
Shampoo Optimizer for Training Neural Networks At-Scale

A Distributed Data-Parallel PyTorch Implementation of the Distributed Shampoo Optimizer for Training Neural Networks At-Scale

12 September 2023

Hao-Jun Michael Shi

Tsung-Hsien Lee

Shintaro Iwasaki

Jose Gallego-Posada

Kaushik Rangadurai

Dheevatsa Mudigere

Papers citing "A Distributed Data-Parallel PyTorch Implementation of the Distributed Shampoo Optimizer for Training Neural Networks At-Scale"

18 / 18 papers shown

Title
Accelerating Deep Neural Network Training via Distributed Hybrid Order Optimization Shunxian Gu Chaoqun You Bangbang Ren Lailong Luo Junxu Xia Deke Guo 34 0 0 02 May 2025
WeatherMesh-3: Fast and accurate operational global weather forecasting Haoxing Du Lyna Kim Joan Creus-Costa Jack Michaels Anuj Shetty Todd Hutchinson Christopher Riedel John Dean AI4Cl 32 1 0 28 Mar 2025
ASGO: Adaptive Structured Gradient Optimization Kang An Yuxing Liu Rui Pan Shiqian Ma D. Goldfarb Tong Zhang ODL 90 2 0 26 Mar 2025
Striving for Simplicity: Simple Yet Effective Prior-Aware Pseudo-Labeling for Semi-Supervised Ultrasound Image Segmentation Yaxiong Chen Yujie Wang Zixuan Zheng Jingliang Hu Yilei Shi Shengwu Xiong Xiao Xiang Zhu Lichao Mou 52 1 0 18 Mar 2025
COSMOS: A Hybrid Adaptive Optimizer for Memory-Efficient Training of LLMs Liming Liu Zhenghao Xu Zixuan Zhang Hao Kang Zichong Li Chen Liang Weizhu Chen T. Zhao 102 1 0 24 Feb 2025
When, Where and Why to Average Weights? Niccolò Ajroldi Antonio Orvieto Jonas Geiping MoMe 91 0 0 10 Feb 2025
Spectral-factorized Positive-definite Curvature Learning for NN Training Wu Lin Felix Dangel Runa Eschenhagen Juhan Bae Richard E. Turner Roger B. Grosse 45 0 0 10 Feb 2025
Modular Duality in Deep Learning Jeremy Bernstein Laker Newhouse 22 2 0 28 Oct 2024
LoRA Done RITE: Robust Invariant Transformation Equilibration for LoRA Optimization Jui-Nan Yen Si Si Zhao Meng Felix X. Yu Sai Surya Duvvuri Inderjit Dhillon Cho-Jui Hsieh Sanjiv Kumar 27 1 0 27 Oct 2024
DEPT: Decoupled Embeddings for Pre-training Language Models Alex Iacob Lorenzo Sani Meghdad Kurmanji William F. Shen Xinchi Qiu Dongqi Cai Yan Gao Nicholas D. Lane VLM 112 0 0 07 Oct 2024
Old Optimizer, New Norm: An Anthology Jeremy Bernstein Laker Newhouse ODL 36 12 0 30 Sep 2024
SOAP: Improving and Stabilizing Shampoo using Adam Nikhil Vyas Depen Morwani Rosie Zhao Itai Shapira David Brandfonbrener Lucas Janson Sham Kakade Sham Kakade 59 23 0 17 Sep 2024
A New Perspective on Shampoo's Preconditioner Depen Morwani Itai Shapira Nikhil Vyas Eran Malach Sham Kakade Lucas Janson 27 7 0 25 Jun 2024
Stochastic Hessian Fittings with Lie Groups Xi-Lin Li 21 1 0 19 Feb 2024
Can We Remove the Square-Root in Adaptive Gradient Methods? A Second-Order Perspective Wu Lin Felix Dangel Runa Eschenhagen Juhan Bae Richard E. Turner Alireza Makhzani ODL 46 12 0 05 Feb 2024
A Computationally Efficient Sparsified Online Newton Method Fnu Devvrit Sai Surya Duvvuri Rohan Anil Vineet Gupta Cho-Jui Hsieh Inderjit Dhillon 18 0 0 16 Nov 2023
Jorge: Approximate Preconditioning for GPU-efficient Second-order Optimization Siddharth Singh Zack Sating A. Bhatele ODL 25 0 0 18 Oct 2023
Bag of Tricks for Image Classification with Convolutional Neural Networks Tong He Zhi-Li Zhang Hang Zhang Zhongyue Zhang Junyuan Xie Mu Li 216 1,398 0 04 Dec 2018