ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2309.06497
  4. Cited By
A Distributed Data-Parallel PyTorch Implementation of the Distributed
  Shampoo Optimizer for Training Neural Networks At-Scale

A Distributed Data-Parallel PyTorch Implementation of the Distributed Shampoo Optimizer for Training Neural Networks At-Scale

12 September 2023
Hao-Jun Michael Shi
Tsung-Hsien Lee
Shintaro Iwasaki
Jose Gallego-Posada
Zhijing Li
Kaushik Rangadurai
Dheevatsa Mudigere
Michael Rabbat
    ODL
ArXivPDFHTML

Papers citing "A Distributed Data-Parallel PyTorch Implementation of the Distributed Shampoo Optimizer for Training Neural Networks At-Scale"

18 / 18 papers shown
Title
Accelerating Deep Neural Network Training via Distributed Hybrid Order Optimization
Accelerating Deep Neural Network Training via Distributed Hybrid Order Optimization
Shunxian Gu
Chaoqun You
Bangbang Ren
Lailong Luo
Junxu Xia
Deke Guo
34
0
0
02 May 2025
WeatherMesh-3: Fast and accurate operational global weather forecasting
WeatherMesh-3: Fast and accurate operational global weather forecasting
Haoxing Du
Lyna Kim
Joan Creus-Costa
Jack Michaels
Anuj Shetty
Todd Hutchinson
Christopher Riedel
John Dean
AI4Cl
32
1
0
28 Mar 2025
ASGO: Adaptive Structured Gradient Optimization
ASGO: Adaptive Structured Gradient Optimization
Kang An
Yuxing Liu
Rui Pan
Shiqian Ma
D. Goldfarb
Tong Zhang
ODL
90
2
0
26 Mar 2025
Striving for Simplicity: Simple Yet Effective Prior-Aware Pseudo-Labeling for Semi-Supervised Ultrasound Image Segmentation
Striving for Simplicity: Simple Yet Effective Prior-Aware Pseudo-Labeling for Semi-Supervised Ultrasound Image Segmentation
Yaxiong Chen
Yujie Wang
Zixuan Zheng
Jingliang Hu
Yilei Shi
Shengwu Xiong
Xiao Xiang Zhu
Lichao Mou
52
1
0
18 Mar 2025
COSMOS: A Hybrid Adaptive Optimizer for Memory-Efficient Training of LLMs
COSMOS: A Hybrid Adaptive Optimizer for Memory-Efficient Training of LLMs
Liming Liu
Zhenghao Xu
Zixuan Zhang
Hao Kang
Zichong Li
Chen Liang
Weizhu Chen
T. Zhao
102
1
0
24 Feb 2025
When, Where and Why to Average Weights?
Niccolò Ajroldi
Antonio Orvieto
Jonas Geiping
MoMe
91
0
0
10 Feb 2025
Spectral-factorized Positive-definite Curvature Learning for NN Training
Spectral-factorized Positive-definite Curvature Learning for NN Training
Wu Lin
Felix Dangel
Runa Eschenhagen
Juhan Bae
Richard E. Turner
Roger B. Grosse
45
0
0
10 Feb 2025
Modular Duality in Deep Learning
Modular Duality in Deep Learning
Jeremy Bernstein
Laker Newhouse
22
2
0
28 Oct 2024
LoRA Done RITE: Robust Invariant Transformation Equilibration for LoRA
  Optimization
LoRA Done RITE: Robust Invariant Transformation Equilibration for LoRA Optimization
Jui-Nan Yen
Si Si
Zhao Meng
Felix X. Yu
Sai Surya Duvvuri
Inderjit Dhillon
Cho-Jui Hsieh
Sanjiv Kumar
27
1
0
27 Oct 2024
DEPT: Decoupled Embeddings for Pre-training Language Models
DEPT: Decoupled Embeddings for Pre-training Language Models
Alex Iacob
Lorenzo Sani
Meghdad Kurmanji
William F. Shen
Xinchi Qiu
Dongqi Cai
Yan Gao
Nicholas D. Lane
VLM
112
0
0
07 Oct 2024
Old Optimizer, New Norm: An Anthology
Old Optimizer, New Norm: An Anthology
Jeremy Bernstein
Laker Newhouse
ODL
36
12
0
30 Sep 2024
SOAP: Improving and Stabilizing Shampoo using Adam
SOAP: Improving and Stabilizing Shampoo using Adam
Nikhil Vyas
Depen Morwani
Rosie Zhao
Itai Shapira
David Brandfonbrener
Lucas Janson
Sham Kakade
Sham Kakade
59
23
0
17 Sep 2024
A New Perspective on Shampoo's Preconditioner
A New Perspective on Shampoo's Preconditioner
Depen Morwani
Itai Shapira
Nikhil Vyas
Eran Malach
Sham Kakade
Lucas Janson
27
7
0
25 Jun 2024
Stochastic Hessian Fittings with Lie Groups
Stochastic Hessian Fittings with Lie Groups
Xi-Lin Li
21
1
0
19 Feb 2024
Can We Remove the Square-Root in Adaptive Gradient Methods? A
  Second-Order Perspective
Can We Remove the Square-Root in Adaptive Gradient Methods? A Second-Order Perspective
Wu Lin
Felix Dangel
Runa Eschenhagen
Juhan Bae
Richard E. Turner
Alireza Makhzani
ODL
46
12
0
05 Feb 2024
A Computationally Efficient Sparsified Online Newton Method
A Computationally Efficient Sparsified Online Newton Method
Fnu Devvrit
Sai Surya Duvvuri
Rohan Anil
Vineet Gupta
Cho-Jui Hsieh
Inderjit Dhillon
18
0
0
16 Nov 2023
Jorge: Approximate Preconditioning for GPU-efficient Second-order
  Optimization
Jorge: Approximate Preconditioning for GPU-efficient Second-order Optimization
Siddharth Singh
Zack Sating
A. Bhatele
ODL
25
0
0
18 Oct 2023
Bag of Tricks for Image Classification with Convolutional Neural
  Networks
Bag of Tricks for Image Classification with Convolutional Neural Networks
Tong He
Zhi-Li Zhang
Hang Zhang
Zhongyue Zhang
Junyuan Xie
Mu Li
216
1,398
0
04 Dec 2018
1