Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2309.06497
Cited By
A Distributed Data-Parallel PyTorch Implementation of the Distributed Shampoo Optimizer for Training Neural Networks At-Scale
12 September 2023
Hao-Jun Michael Shi
Tsung-Hsien Lee
Shintaro Iwasaki
Jose Gallego-Posada
Zhijing Li
Kaushik Rangadurai
Dheevatsa Mudigere
Michael Rabbat
ODL
Re-assign community
ArXiv
PDF
HTML
Papers citing
"A Distributed Data-Parallel PyTorch Implementation of the Distributed Shampoo Optimizer for Training Neural Networks At-Scale"
18 / 18 papers shown
Title
Accelerating Deep Neural Network Training via Distributed Hybrid Order Optimization
Shunxian Gu
Chaoqun You
Bangbang Ren
Lailong Luo
Junxu Xia
Deke Guo
34
0
0
02 May 2025
WeatherMesh-3: Fast and accurate operational global weather forecasting
Haoxing Du
Lyna Kim
Joan Creus-Costa
Jack Michaels
Anuj Shetty
Todd Hutchinson
Christopher Riedel
John Dean
AI4Cl
32
1
0
28 Mar 2025
ASGO: Adaptive Structured Gradient Optimization
Kang An
Yuxing Liu
Rui Pan
Shiqian Ma
D. Goldfarb
Tong Zhang
ODL
90
2
0
26 Mar 2025
Striving for Simplicity: Simple Yet Effective Prior-Aware Pseudo-Labeling for Semi-Supervised Ultrasound Image Segmentation
Yaxiong Chen
Yujie Wang
Zixuan Zheng
Jingliang Hu
Yilei Shi
Shengwu Xiong
Xiao Xiang Zhu
Lichao Mou
52
1
0
18 Mar 2025
COSMOS: A Hybrid Adaptive Optimizer for Memory-Efficient Training of LLMs
Liming Liu
Zhenghao Xu
Zixuan Zhang
Hao Kang
Zichong Li
Chen Liang
Weizhu Chen
T. Zhao
102
1
0
24 Feb 2025
When, Where and Why to Average Weights?
Niccolò Ajroldi
Antonio Orvieto
Jonas Geiping
MoMe
91
0
0
10 Feb 2025
Spectral-factorized Positive-definite Curvature Learning for NN Training
Wu Lin
Felix Dangel
Runa Eschenhagen
Juhan Bae
Richard E. Turner
Roger B. Grosse
45
0
0
10 Feb 2025
Modular Duality in Deep Learning
Jeremy Bernstein
Laker Newhouse
22
2
0
28 Oct 2024
LoRA Done RITE: Robust Invariant Transformation Equilibration for LoRA Optimization
Jui-Nan Yen
Si Si
Zhao Meng
Felix X. Yu
Sai Surya Duvvuri
Inderjit Dhillon
Cho-Jui Hsieh
Sanjiv Kumar
27
1
0
27 Oct 2024
DEPT: Decoupled Embeddings for Pre-training Language Models
Alex Iacob
Lorenzo Sani
Meghdad Kurmanji
William F. Shen
Xinchi Qiu
Dongqi Cai
Yan Gao
Nicholas D. Lane
VLM
112
0
0
07 Oct 2024
Old Optimizer, New Norm: An Anthology
Jeremy Bernstein
Laker Newhouse
ODL
36
12
0
30 Sep 2024
SOAP: Improving and Stabilizing Shampoo using Adam
Nikhil Vyas
Depen Morwani
Rosie Zhao
Itai Shapira
David Brandfonbrener
Lucas Janson
Sham Kakade
Sham Kakade
59
23
0
17 Sep 2024
A New Perspective on Shampoo's Preconditioner
Depen Morwani
Itai Shapira
Nikhil Vyas
Eran Malach
Sham Kakade
Lucas Janson
27
7
0
25 Jun 2024
Stochastic Hessian Fittings with Lie Groups
Xi-Lin Li
21
1
0
19 Feb 2024
Can We Remove the Square-Root in Adaptive Gradient Methods? A Second-Order Perspective
Wu Lin
Felix Dangel
Runa Eschenhagen
Juhan Bae
Richard E. Turner
Alireza Makhzani
ODL
46
12
0
05 Feb 2024
A Computationally Efficient Sparsified Online Newton Method
Fnu Devvrit
Sai Surya Duvvuri
Rohan Anil
Vineet Gupta
Cho-Jui Hsieh
Inderjit Dhillon
18
0
0
16 Nov 2023
Jorge: Approximate Preconditioning for GPU-efficient Second-order Optimization
Siddharth Singh
Zack Sating
A. Bhatele
ODL
25
0
0
18 Oct 2023
Bag of Tricks for Image Classification with Convolutional Neural Networks
Tong He
Zhi-Li Zhang
Hang Zhang
Zhongyue Zhang
Junyuan Xie
Mu Li
216
1,398
0
04 Dec 2018
1