Dynamic Stale Synchronous Parallel Distributed Training for Deep Learning

IEEE International Conference on Distributed Computing Systems (ICDCS), 2019

16 August 2019

Papers citing "Dynamic Stale Synchronous Parallel Distributed Training for Deep Learning"

18 / 18 papers shown

Hybrid Dual-Batch and Cyclic Progressive Learning for Efficient Distributed Training

153

30 Sep 2025

Dynamic Clustering for Personalized Federated Learning on Heterogeneous Edge Devices

205

03 Aug 2025

Acceleration for Deep Reinforcement Learning using Parallel and Distributed Computing: A SurveyACM Computing Surveys (ACM CSUR), 2024

336

08 Nov 2024

Malleus: Straggler-Resilient Hybrid Parallel Training of Large-scale Models via Malleable Data and Model Parallelization

419

17 Oct 2024

An Interdisciplinary Outlook on Large Language Models for Scientific Research

...

Anastasia Visheratina

Xin Xie

288

03 Nov 2023

ABS-SGD: A Delayed Synchronous Stochastic Gradient Descent Algorithm with Adaptive Batch Size for Heterogeneous GPU Clusters

Xin Zhou

Ling Chen

Houming Wu

192

29 Aug 2023

OSP: Boosting Distributed Model Training with 2-stage SynchronizationInternational Conference on Parallel Processing (ICPP), 2023

327

29 Jun 2023

Communication-Efficient Federated Learning for Heterogeneous Edge Devices Based on Adaptive Gradient QuantizationIEEE Conference on Computer Communications (INFOCOM), 2022

205

16 Dec 2022

A Comprehensive Survey on Distributed Training of Graph Neural NetworksProceedings of the IEEE (Proc. IEEE), 2022

Xiaochun Ye

351

10 Nov 2022

Fair and Efficient Distributed Edge Learning with Hybrid Multipath TCPIEEE/ACM Transactions on Networking (TON), 2022

Mengyue Deng

Jinho Choi

A. Walid

218

03 Nov 2022

Byzantine Fault Tolerance in Distributed Machine Learning : a Survey

383

05 May 2022

On the Future of Cloud Engineering

167

19 Aug 2021

Quantifying and Improving Performance of Distributed Deep Learning with Cloud Storage

Nicholas Krichevsky

M. S. Louis

Tian Guo

203

13 Aug 2021

Sync-Switch: Hybrid Parameter Synchronization for Distributed Deep LearningIEEE International Conference on Distributed Computing Systems (ICDCS), 2021

280

16 Apr 2021

BaPipe: Exploration of Balanced Pipeline Parallelism for DNN Training

274

23 Dec 2020

A Fast Edge-Based Synchronizer for Tasks in Real-Time Artificial Intelligence ApplicationsIEEE Internet of Things Journal (IEEE IoT J.), 2020

R. Olaniyan

Muthucumaru Maheswaran

117

21 Dec 2020

A Quantitative Survey of Communication Optimizations in Distributed Deep Learning

164

27 May 2020

Elastic Bulk Synchronous Parallel Model for Distributed Deep LearningIndustrial Conference on Data Mining (IDM), 2019

260

06 Jan 2020