Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1908.04207
Cited By
Taming Unbalanced Training Workloads in Deep Learning with Partial Collective Operations
12 August 2019
Shigang Li
Tal Ben-Nun
Salvatore Di Girolamo
Dan Alistarh
Torsten Hoefler
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Taming Unbalanced Training Workloads in Deep Learning with Partial Collective Operations"
9 / 9 papers shown
Title
OmniLearn: A Framework for Distributed Deep Learning over Heterogeneous Clusters
S. Tyagi
Prateek Sharma
73
0
0
21 Mar 2025
Canary: Congestion-Aware In-Network Allreduce Using Dynamic Trees
Daniele De Sensi
Edgar Costa Molero
Salvatore Di Girolamo
Laurent Vanbever
Torsten Hoefler
17
3
0
28 Sep 2023
Straggler-Resilient Decentralized Learning via Adaptive Asynchronous Updates
Guojun Xiong
Gang Yan
Shiqiang Wang
Jian Li
31
3
0
11 Jun 2023
Parallel and Distributed Graph Neural Networks: An In-Depth Concurrency Analysis
Maciej Besta
Torsten Hoefler
GNN
39
54
0
19 May 2022
Chimera: Efficiently Training Large-Scale Neural Networks with Bidirectional Pipelines
Shigang Li
Torsten Hoefler
GNN
AI4CE
LRM
80
132
0
14 Jul 2021
Flare: Flexible In-Network Allreduce
Daniele De Sensi
Salvatore Di Girolamo
Saleh Ashkboos
Shigang Li
Torsten Hoefler
32
40
0
29 Jun 2021
CD-SGD: Distributed Stochastic Gradient Descent with Compression and Delay Compensation
Enda Yu
Dezun Dong
Yemao Xu
Shuo Ouyang
Xiangke Liao
16
5
0
21 Jun 2021
Consistent Lock-free Parallel Stochastic Gradient Descent for Fast and Stable Convergence
Karl Bäckström
Ivan Walulya
Marina Papatriantafilou
P. Tsigas
37
5
0
17 Feb 2021
DS-Sync: Addressing Network Bottlenecks with Divide-and-Shuffle Synchronization for Distributed DNN Training
Weiyan Wang
Cengguang Zhang
Liu Yang
Kai Chen
Kun Tan
34
12
0
07 Jul 2020
1