Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
1908.04207
Cited By
v1
v2
v3
v4
v5 (latest)
Taming Unbalanced Training Workloads in Deep Learning with Partial Collective Operations
ACM SIGPLAN Symposium on Principles & Practice of Parallel Programming (PPoPP), 2019
12 August 2019
Shigang Li
Tal Ben-Nun
Salvatore Di Girolamo
Dan Alistarh
Torsten Hoefler
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Taming Unbalanced Training Workloads in Deep Learning with Partial Collective Operations"
21 / 21 papers shown
OmniLearn: A Framework for Distributed Deep Learning over Heterogeneous Clusters
IEEE Transactions on Parallel and Distributed Systems (TPDS), 2025
S. Tyagi
Prateek Sharma
466
4
0
21 Mar 2025
WallFacer: Guiding Transformer Model Training Out of the Long-Context Dark Forest with N-body Problem
Ziming Liu
Shaoyu Wang
Shenggan Cheng
Zhongkai Zhao
Xuanlei Zhao
James Demmel
Yang You
244
1
0
30 Jun 2024
HPCClusterScape: Increasing Transparency and Efficiency of Shared High-Performance Computing Clusters for Large-scale AI Models
Heungseok Park
Aeree Cho
Hyojun Jeon
Hayoung Lee
Youngil Yang
Sungjae Lee
Heungsub Lee
Jaegul Choo
197
3
0
03 Oct 2023
Canary: Congestion-Aware In-Network Allreduce Using Dynamic Trees
Future generations computer systems (FGCS), 2023
Daniele De Sensi
Edgar Costa Molero
Salvatore Di Girolamo
Laurent Vanbever
Torsten Hoefler
223
8
0
28 Sep 2023
Hanayo: Harnessing Wave-like Pipeline Parallelism for Enhanced Large Model Training Efficiency
International Conference for High Performance Computing, Networking, Storage and Analysis (SC), 2023
Ziming Liu
Shenggan Cheng
Hao Zhou
Yang You
263
56
0
30 Aug 2023
ABS-SGD: A Delayed Synchronous Stochastic Gradient Descent Algorithm with Adaptive Batch Size for Heterogeneous GPU Clusters
Xin Zhou
Ling Chen
Houming Wu
192
1
0
29 Aug 2023
Straggler-Resilient Decentralized Learning via Adaptive Asynchronous Updates
ACM Interational Symposium on Mobile Ad Hoc Networking and Computing (MobiHoc), 2023
Efstathia Soufleri
Gang Yan
Maroun Touma
Jian Li
289
7
0
11 Jun 2023
ADA-GP: Accelerating DNN Training By Adaptive Gradient Prediction
Micro (MICRO), 2023
Vahid Janfaza
Shantanu Mandal
Farabi Mahmud
A. Muzahid
192
5
0
22 May 2023
AutoDDL: Automatic Distributed Deep Learning with Near-Optimal Bandwidth Cost
IEEE Transactions on Parallel and Distributed Systems (TPDS), 2023
Jinfan Chen
Shigang Li
Ran Guo
Jinhui Yuan
Torsten Hoefler
195
5
0
17 Jan 2023
Parallel and Distributed Graph Neural Networks: An In-Depth Concurrency Analysis
IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2022
Maciej Besta
Torsten Hoefler
GNN
558
79
0
19 May 2022
Asynchronous Fully-Decentralized SGD in the Cluster-Based Model
International/Italian Conference on Algorithms and Complexity (IAC), 2022
Hagit Attiya
N. Schiller
FedML
346
1
0
22 Feb 2022
Near-Optimal Sparse Allreduce for Distributed Deep Learning
ACM SIGPLAN Symposium on Principles & Practice of Parallel Programming (PPoPP), 2022
Shigang Li
Torsten Hoefler
308
66
0
19 Jan 2022
A Data-Centric Optimization Framework for Machine Learning
International Conference on Supercomputing (ICS), 2021
Oliver Rausch
Tal Ben-Nun
Nikoli Dryden
Andrei Ivanov
Shigang Li
Torsten Hoefler
AI4CE
326
19
0
20 Oct 2021
Chimera: Efficiently Training Large-Scale Neural Networks with Bidirectional Pipelines
Shigang Li
Torsten Hoefler
AI4CE
LRM
GNN
620
176
0
14 Jul 2021
Flare: Flexible In-Network Allreduce
International Conference for High Performance Computing, Networking, Storage and Analysis (SC), 2021
Daniele De Sensi
Salvatore Di Girolamo
Saleh Ashkboos
Shigang Li
Torsten Hoefler
260
55
0
29 Jun 2021
CD-SGD: Distributed Stochastic Gradient Descent with Compression and Delay Compensation
International Conference on Parallel Processing (ICPP), 2021
Enda Yu
Dezun Dong
Yemao Xu
Shuo Ouyang
Xiangke Liao
178
6
0
21 Jun 2021
Consistent Lock-free Parallel Stochastic Gradient Descent for Fast and Stable Convergence
IEEE International Parallel and Distributed Processing Symposium (IPDPS), 2021
Karl Bäckström
Ivan Walulya
Marina Papatriantafilou
P. Tsigas
242
5
0
17 Feb 2021
An In-Depth Analysis of the Slingshot Interconnect
Daniele De Sensi
Salvatore Di Girolamo
K. McMahon
Duncan Roweth
Torsten Hoefler
203
130
0
20 Aug 2020
DS-Sync: Addressing Network Bottlenecks with Divide-and-Shuffle Synchronization for Distributed DNN Training
Weiyan Wang
Cengguang Zhang
Liu Yang
Kai Chen
Kun Tan
241
15
0
07 Jul 2020
Breaking (Global) Barriers in Parallel Stochastic Optimization with Wait-Avoiding Group Averaging
IEEE Transactions on Parallel and Distributed Systems (TPDS), 2020
Shigang Li
Tal Ben-Nun
Giorgi Nadiradze
Salvatore Di Girolamo
Nikoli Dryden
Dan Alistarh
Torsten Hoefler
463
15
0
30 Apr 2020
Asynchronous Decentralized SGD with Quantized and Local Updates
Neural Information Processing Systems (NeurIPS), 2019
Giorgi Nadiradze
Amirmojtaba Sabour
Peter Davies
Shigang Li
Dan Alistarh
354
52
0
27 Oct 2019
1
Page 1 of 1