ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1908.04207
  4. Cited By
Taming Unbalanced Training Workloads in Deep Learning with Partial Collective Operations
v1v2v3v4v5 (latest)

Taming Unbalanced Training Workloads in Deep Learning with Partial Collective Operations

ACM SIGPLAN Symposium on Principles & Practice of Parallel Programming (PPoPP), 2019
12 August 2019
Shigang Li
Tal Ben-Nun
Salvatore Di Girolamo
Dan Alistarh
Torsten Hoefler
ArXiv (abs)PDFHTML

Papers citing "Taming Unbalanced Training Workloads in Deep Learning with Partial Collective Operations"

21 / 21 papers shown
OmniLearn: A Framework for Distributed Deep Learning over Heterogeneous Clusters
OmniLearn: A Framework for Distributed Deep Learning over Heterogeneous ClustersIEEE Transactions on Parallel and Distributed Systems (TPDS), 2025
S. Tyagi
Prateek Sharma
466
4
0
21 Mar 2025
WallFacer: Guiding Transformer Model Training Out of the Long-Context
  Dark Forest with N-body Problem
WallFacer: Guiding Transformer Model Training Out of the Long-Context Dark Forest with N-body Problem
Ziming Liu
Shaoyu Wang
Shenggan Cheng
Zhongkai Zhao
Xuanlei Zhao
James Demmel
Yang You
244
1
0
30 Jun 2024
HPCClusterScape: Increasing Transparency and Efficiency of Shared
  High-Performance Computing Clusters for Large-scale AI Models
HPCClusterScape: Increasing Transparency and Efficiency of Shared High-Performance Computing Clusters for Large-scale AI Models
Heungseok Park
Aeree Cho
Hyojun Jeon
Hayoung Lee
Youngil Yang
Sungjae Lee
Heungsub Lee
Jaegul Choo
197
3
0
03 Oct 2023
Canary: Congestion-Aware In-Network Allreduce Using Dynamic Trees
Canary: Congestion-Aware In-Network Allreduce Using Dynamic TreesFuture generations computer systems (FGCS), 2023
Daniele De Sensi
Edgar Costa Molero
Salvatore Di Girolamo
Laurent Vanbever
Torsten Hoefler
223
8
0
28 Sep 2023
Hanayo: Harnessing Wave-like Pipeline Parallelism for Enhanced Large
  Model Training Efficiency
Hanayo: Harnessing Wave-like Pipeline Parallelism for Enhanced Large Model Training EfficiencyInternational Conference for High Performance Computing, Networking, Storage and Analysis (SC), 2023
Ziming Liu
Shenggan Cheng
Hao Zhou
Yang You
263
56
0
30 Aug 2023
ABS-SGD: A Delayed Synchronous Stochastic Gradient Descent Algorithm
  with Adaptive Batch Size for Heterogeneous GPU Clusters
ABS-SGD: A Delayed Synchronous Stochastic Gradient Descent Algorithm with Adaptive Batch Size for Heterogeneous GPU Clusters
Xin Zhou
Ling Chen
Houming Wu
192
1
0
29 Aug 2023
Straggler-Resilient Decentralized Learning via Adaptive Asynchronous
  Updates
Straggler-Resilient Decentralized Learning via Adaptive Asynchronous UpdatesACM Interational Symposium on Mobile Ad Hoc Networking and Computing (MobiHoc), 2023
Efstathia Soufleri
Gang Yan
Maroun Touma
Jian Li
289
7
0
11 Jun 2023
ADA-GP: Accelerating DNN Training By Adaptive Gradient Prediction
ADA-GP: Accelerating DNN Training By Adaptive Gradient PredictionMicro (MICRO), 2023
Vahid Janfaza
Shantanu Mandal
Farabi Mahmud
A. Muzahid
192
5
0
22 May 2023
AutoDDL: Automatic Distributed Deep Learning with Near-Optimal Bandwidth
  Cost
AutoDDL: Automatic Distributed Deep Learning with Near-Optimal Bandwidth CostIEEE Transactions on Parallel and Distributed Systems (TPDS), 2023
Jinfan Chen
Shigang Li
Ran Guo
Jinhui Yuan
Torsten Hoefler
195
5
0
17 Jan 2023
Parallel and Distributed Graph Neural Networks: An In-Depth Concurrency
  Analysis
Parallel and Distributed Graph Neural Networks: An In-Depth Concurrency AnalysisIEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2022
Maciej Besta
Torsten Hoefler
GNN
558
79
0
19 May 2022
Asynchronous Fully-Decentralized SGD in the Cluster-Based Model
Asynchronous Fully-Decentralized SGD in the Cluster-Based ModelInternational/Italian Conference on Algorithms and Complexity (IAC), 2022
Hagit Attiya
N. Schiller
FedML
346
1
0
22 Feb 2022
Near-Optimal Sparse Allreduce for Distributed Deep Learning
Near-Optimal Sparse Allreduce for Distributed Deep LearningACM SIGPLAN Symposium on Principles & Practice of Parallel Programming (PPoPP), 2022
Shigang Li
Torsten Hoefler
308
66
0
19 Jan 2022
A Data-Centric Optimization Framework for Machine Learning
A Data-Centric Optimization Framework for Machine LearningInternational Conference on Supercomputing (ICS), 2021
Oliver Rausch
Tal Ben-Nun
Nikoli Dryden
Andrei Ivanov
Shigang Li
Torsten Hoefler
AI4CE
326
19
0
20 Oct 2021
Chimera: Efficiently Training Large-Scale Neural Networks with Bidirectional Pipelines
Chimera: Efficiently Training Large-Scale Neural Networks with Bidirectional Pipelines
Shigang Li
Torsten Hoefler
AI4CELRMGNN
620
176
0
14 Jul 2021
Flare: Flexible In-Network Allreduce
Flare: Flexible In-Network AllreduceInternational Conference for High Performance Computing, Networking, Storage and Analysis (SC), 2021
Daniele De Sensi
Salvatore Di Girolamo
Saleh Ashkboos
Shigang Li
Torsten Hoefler
260
55
0
29 Jun 2021
CD-SGD: Distributed Stochastic Gradient Descent with Compression and
  Delay Compensation
CD-SGD: Distributed Stochastic Gradient Descent with Compression and Delay CompensationInternational Conference on Parallel Processing (ICPP), 2021
Enda Yu
Dezun Dong
Yemao Xu
Shuo Ouyang
Xiangke Liao
178
6
0
21 Jun 2021
Consistent Lock-free Parallel Stochastic Gradient Descent for Fast and
  Stable Convergence
Consistent Lock-free Parallel Stochastic Gradient Descent for Fast and Stable ConvergenceIEEE International Parallel and Distributed Processing Symposium (IPDPS), 2021
Karl Bäckström
Ivan Walulya
Marina Papatriantafilou
P. Tsigas
242
5
0
17 Feb 2021
An In-Depth Analysis of the Slingshot Interconnect
An In-Depth Analysis of the Slingshot Interconnect
Daniele De Sensi
Salvatore Di Girolamo
K. McMahon
Duncan Roweth
Torsten Hoefler
203
130
0
20 Aug 2020
DS-Sync: Addressing Network Bottlenecks with Divide-and-Shuffle
  Synchronization for Distributed DNN Training
DS-Sync: Addressing Network Bottlenecks with Divide-and-Shuffle Synchronization for Distributed DNN Training
Weiyan Wang
Cengguang Zhang
Liu Yang
Kai Chen
Kun Tan
241
15
0
07 Jul 2020
Breaking (Global) Barriers in Parallel Stochastic Optimization with Wait-Avoiding Group Averaging
Breaking (Global) Barriers in Parallel Stochastic Optimization with Wait-Avoiding Group AveragingIEEE Transactions on Parallel and Distributed Systems (TPDS), 2020
Shigang Li
Tal Ben-Nun
Giorgi Nadiradze
Salvatore Di Girolamo
Nikoli Dryden
Dan Alistarh
Torsten Hoefler
463
15
0
30 Apr 2020
Asynchronous Decentralized SGD with Quantized and Local Updates
Asynchronous Decentralized SGD with Quantized and Local UpdatesNeural Information Processing Systems (NeurIPS), 2019
Giorgi Nadiradze
Amirmojtaba Sabour
Peter Davies
Shigang Li
Dan Alistarh
354
52
0
27 Oct 2019
1
Page 1 of 1