Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2103.03239
Cited By
Moshpit SGD: Communication-Efficient Decentralized Training on Heterogeneous Unreliable Devices
4 March 2021
Max Ryabinin
Eduard A. Gorbunov
Vsevolod Plokhotnyuk
Gennady Pekhimenko
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Moshpit SGD: Communication-Efficient Decentralized Training on Heterogeneous Unreliable Devices"
8 / 8 papers shown
Title
Pseudo-Asynchronous Local SGD: Robust and Efficient Data-Parallel Training
Hiroki Naganuma
Xinzhi Zhang
Man-Chung Yue
Ioannis Mitliagkas
Philipp A. Witte
Russell J. Hewett
Yin Tat Lee
63
0
0
25 Apr 2025
Decentralized Learning Made Practical with Client Sampling
M. Vos
Akash Dhasade
Anne-Marie Kermarrec
Erick Lavoie
J. Pouwelse
Rishi Sharma
27
1
0
27 Feb 2023
SWARM Parallelism: Training Large Models Can Be Surprisingly Communication-Efficient
Max Ryabinin
Tim Dettmers
Michael Diskin
Alexander Borzunov
MoE
22
31
0
27 Jan 2023
Towards Efficient Communications in Federated Learning: A Contemporary Survey
Zihao Zhao
Yuzhu Mao
Yang Liu
Linqi Song
Ouyang Ye
Xinlei Chen
Wenbo Ding
FedML
43
59
0
02 Aug 2022
NUQSGD: Provably Communication-efficient Data-parallel SGD via Nonuniform Quantization
Ali Ramezani-Kebrya
Fartash Faghri
Ilya Markov
V. Aksenov
Dan Alistarh
Daniel M. Roy
MQ
57
30
0
28 Apr 2021
Linearly Converging Error Compensated SGD
Eduard A. Gorbunov
D. Kovalev
Dmitry Makarenko
Peter Richtárik
163
77
0
23 Oct 2020
Scaling Laws for Neural Language Models
Jared Kaplan
Sam McCandlish
T. Henighan
Tom B. Brown
B. Chess
R. Child
Scott Gray
Alec Radford
Jeff Wu
Dario Amodei
226
4,453
0
23 Jan 2020
Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism
M. Shoeybi
M. Patwary
Raul Puri
P. LeGresley
Jared Casper
Bryan Catanzaro
MoE
243
1,817
0
17 Sep 2019
1