ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2003.03009
  4. Cited By
Communication optimization strategies for distributed deep neural
  network training: A survey
v1v2 (latest)

Communication optimization strategies for distributed deep neural network training: A survey

6 March 2020
Shuo Ouyang
Dezun Dong
Yemao Xu
Liquan Xiao
ArXiv (abs)PDFHTML

Papers citing "Communication optimization strategies for distributed deep neural network training: A survey"

50 / 67 papers shown
Fair and Efficient Distributed Edge Learning with Hybrid Multipath TCP
Fair and Efficient Distributed Edge Learning with Hybrid Multipath TCPIEEE/ACM Transactions on Networking (TON), 2022
Mengyue Deng
Jinho Choi
A. Walid
227
10
0
03 Nov 2022
HPSGD: Hierarchical Parallel SGD With Stale Gradients Featuring
HPSGD: Hierarchical Parallel SGD With Stale Gradients Featuring
Yuhao Zhou
Qing Ye
Hailun Zhang
Jiancheng Lv
3DH
276
0
0
06 Sep 2020
DBS: Dynamic Batch Size For Distributed Deep Neural Network Training
DBS: Dynamic Batch Size For Distributed Deep Neural Network Training
Qing Ye
Yuhao Zhou
Mingjia Shi
Yanan Sun
Jiancheng Lv
265
11
0
23 Jul 2020
Enabling Compute-Communication Overlap in Distributed Deep Learning
  Training Platforms
Enabling Compute-Communication Overlap in Distributed Deep Learning Training Platforms
Saeed Rashidi
Matthew Denton
Srinivas Sridharan
Sudarshan Srinivasan
Amoghavarsha Suresh
Jade Nie
T. Krishna
547
59
0
30 Jun 2020
Local SGD with Periodic Averaging: Tighter Analysis and Adaptive
  Synchronization
Local SGD with Periodic Averaging: Tighter Analysis and Adaptive SynchronizationNeural Information Processing Systems (NeurIPS), 2019
Farzin Haddadpour
Mohammad Mahdi Kamani
M. Mahdavi
V. Cadambe
FedML
433
222
0
30 Oct 2019
Faster Distributed Deep Net Training: Computation and Communication
  Decoupled Stochastic Gradient Descent
Faster Distributed Deep Net Training: Computation and Communication Decoupled Stochastic Gradient DescentInternational Joint Conference on Artificial Intelligence (IJCAI), 2019
Shuheng Shen
Linli Xu
Jingchang Liu
Xianfeng Liang
Yifei Cheng
ODLFedML
245
25
0
28 Jun 2019
PowerSGD: Practical Low-Rank Gradient Compression for Distributed
  Optimization
PowerSGD: Practical Low-Rank Gradient Compression for Distributed OptimizationNeural Information Processing Systems (NeurIPS), 2019
Thijs Vogels
Sai Praneeth Karimireddy
Martin Jaggi
549
406
0
31 May 2019
Priority-based Parameter Propagation for Distributed DNN Training
Priority-based Parameter Propagation for Distributed DNN TrainingUSENIX workshop on Tackling computer systems problems with machine learning techniques (SysML), 2019
Anand Jayarajan
Jinliang Wei
Garth A. Gibson
Alexandra Fedorova
Gennady Pekhimenko
AI4CE
250
193
0
10 May 2019
Large Batch Optimization for Deep Learning: Training BERT in 76 minutes
Large Batch Optimization for Deep Learning: Training BERT in 76 minutes
Yang You
Jing Li
Sashank J. Reddi
Jonathan Hseu
Sanjiv Kumar
Srinadh Bhojanapalli
Xiaodan Song
J. Demmel
Kurt Keutzer
Cho-Jui Hsieh
ODL
1.1K
1,157
0
01 Apr 2019
Yet Another Accelerated SGD: ResNet-50 Training on ImageNet in 74.7
  seconds
Yet Another Accelerated SGD: ResNet-50 Training on ImageNet in 74.7 seconds
Masafumi Yamazaki
Akihiko Kasagi
Akihiro Tabuchi
Takumi Honda
Masahiro Miwa
Naoto Fukumoto
Tsuguchika Tabaru
Atsushi Ike
Kohta Nakashima
173
92
0
29 Mar 2019
Scalable Deep Learning on Distributed Infrastructures: Challenges,
  Techniques and Tools
Scalable Deep Learning on Distributed Infrastructures: Challenges, Techniques and Tools
R. Mayer
Hans-Arno Jacobsen
GNN
430
223
0
27 Mar 2019
Error Feedback Fixes SignSGD and other Gradient Compression Schemes
Error Feedback Fixes SignSGD and other Gradient Compression Schemes
Sai Praneeth Karimireddy
Quentin Rebjock
Sebastian U. Stich
Martin Jaggi
687
614
0
28 Jan 2019
Large-Batch Training for LSTM and Beyond
Large-Batch Training for LSTM and Beyond
Yang You
Jonathan Hseu
Chris Ying
J. Demmel
Kurt Keutzer
Cho-Jui Hsieh
298
97
0
24 Jan 2019
MG-WFBP: Efficient Data Communication for Distributed Synchronous SGD
  Algorithms
MG-WFBP: Efficient Data Communication for Distributed Synchronous SGD Algorithms
Shaoshuai Shi
Xiaowen Chu
Bo Li
FedML
193
103
0
27 Nov 2018
Image Classification at Supercomputer Scale
Image Classification at Supercomputer Scale
Chris Ying
Sameer Kumar
Dehao Chen
Tao Wang
Youlong Cheng
VLM
228
128
0
16 Nov 2018
Massively Distributed SGD: ImageNet/ResNet-50 Training in a Flash
Massively Distributed SGD: ImageNet/ResNet-50 Training in a Flash
Hiroaki Mikami
Hisahiro Suganuma
Pongsakorn U-chupala
Yoshiki Tanaka
Yuichi Kageyama
228
80
0
13 Nov 2018
GradiVeQ: Vector Quantization for Bandwidth-Efficient Gradient
  Aggregation in Distributed CNN Training
GradiVeQ: Vector Quantization for Bandwidth-Efficient Gradient Aggregation in Distributed CNN TrainingNeural Information Processing Systems (NeurIPS), 2018
Timo C. Wunderlich
Zhifeng Lin
S. A. Aamir
Andreas Grübl
Youjie Li
David Stöckel
Alex Schwing
M. Annavaram
A. Avestimehr
MQ
197
69
0
08 Nov 2018
A Hitchhiker's Guide On Distributed Training of Deep Neural Networks
A Hitchhiker's Guide On Distributed Training of Deep Neural Networks
K. Chahal
Manraj Singh Grover
Kuntal Dey
3DHOOD
263
55
0
28 Oct 2018
BERT: Pre-training of Deep Bidirectional Transformers for Language
  Understanding
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Jacob Devlin
Ming-Wei Chang
Kenton Lee
Kristina Toutanova
VLMSSLSSeg
3.1K
112,756
0
11 Oct 2018
The Convergence of Sparsified Gradient Methods
The Convergence of Sparsified Gradient MethodsNeural Information Processing Systems (NeurIPS), 2018
Dan Alistarh
Torsten Hoefler
M. Johansson
Sarit Khirirat
Nikola Konstantinov
Cédric Renggli
406
540
0
27 Sep 2018
Sparsified SGD with Memory
Sparsified SGD with Memory
Sebastian U. Stich
Jean-Baptiste Cordonnier
Martin Jaggi
477
858
0
20 Sep 2018
RedSync : Reducing Synchronization Traffic for Distributed Deep Learning
RedSync : Reducing Synchronization Traffic for Distributed Deep Learning
Jiarui Fang
Haohuan Fu
Guangwen Yang
Cho-Jui Hsieh
GNN
349
29
0
13 Aug 2018
Highly Scalable Deep Learning Training System with Mixed-Precision:
  Training ImageNet in Four Minutes
Highly Scalable Deep Learning Training System with Mixed-Precision: Training ImageNet in Four Minutes
Chencan Wu
Shutao Song
W. He
Yangzihao Wang
Haidong Rong
...
Li Yu
Tiegang Chen
Guangxiao Hu
Shaoshuai Shi
Xiaowen Chu
263
420
0
30 Jul 2018
Parallel Restarted SGD with Faster Convergence and Less Communication:
  Demystifying Why Model Averaging Works for Deep Learning
Parallel Restarted SGD with Faster Convergence and Less Communication: Demystifying Why Model Averaging Works for Deep LearningAAAI Conference on Artificial Intelligence (AAAI), 2018
Hao Yu
Sen Yang
Shenghuo Zhu
MoMeFedML
613
672
0
17 Jul 2018
Error Compensated Quantized SGD and its Applications to Large-scale
  Distributed Optimization
Error Compensated Quantized SGD and its Applications to Large-scale Distributed Optimization
Jiaxiang Wu
Weidong Huang
Junzhou Huang
Tong Zhang
304
250
0
21 Jun 2018
ATOMO: Communication-efficient Learning via Atomic Sparsification
ATOMO: Communication-efficient Learning via Atomic Sparsification
Hongyi Wang
Scott Sievert
Zachary B. Charles
Shengchao Liu
S. Wright
Dimitris Papailiopoulos
432
382
0
11 Jun 2018
Double Quantization for Communication-Efficient Distributed Optimization
Double Quantization for Communication-Efficient Distributed Optimization
Yue Yu
Jiaxiang Wu
Longbo Huang
MQ
464
59
0
25 May 2018
LAG: Lazily Aggregated Gradient for Communication-Efficient Distributed
  Learning
LAG: Lazily Aggregated Gradient for Communication-Efficient Distributed Learning
Tianyi Chen
G. Giannakis
Tao Sun
W. Yin
373
321
0
25 May 2018
Local SGD Converges Fast and Communicates Little
Local SGD Converges Fast and Communicates Little
Sebastian U. Stich
FedML
1.4K
1,227
0
24 May 2018
Sparse Binary Compression: Towards Distributed Deep Learning with
  minimal Communication
Sparse Binary Compression: Towards Distributed Deep Learning with minimal Communication
Felix Sattler
Simon Wiedemann
K. Müller
Wojciech Samek
MQ
224
237
0
22 May 2018
TicTac: Accelerating Distributed Deep Learning with Communication
  Scheduling
TicTac: Accelerating Distributed Deep Learning with Communication Scheduling
Sayed Hadi Hashemi
Sangeetha Abdu Jyothi
R. Campbell
269
207
0
08 Mar 2018
Demystifying Parallel and Distributed Deep Learning: An In-Depth
  Concurrency Analysis
Demystifying Parallel and Distributed Deep Learning: An In-Depth Concurrency AnalysisACM Computing Surveys (CSUR), 2018
Tal Ben-Nun
Torsten Hoefler
GNN
469
781
0
26 Feb 2018
SparCML: High-Performance Sparse Communication for Machine Learning
SparCML: High-Performance Sparse Communication for Machine Learning
Cédric Renggli
Saleh Ashkboos
Mehdi Aghagolzadeh
Dan Alistarh
Torsten Hoefler
370
145
0
22 Feb 2018
3LC: Lightweight and Effective Traffic Compression for Distributed
  Machine Learning
3LC: Lightweight and Effective Traffic Compression for Distributed Machine Learning
Hyeontaek Lim
D. Andersen
M. Kaminsky
369
80
0
21 Feb 2018
Horovod: fast and easy distributed deep learning in TensorFlow
Horovod: fast and easy distributed deep learning in TensorFlow
Alexander Sergeev
Mike Del Balso
467
1,357
0
15 Feb 2018
signSGD: Compressed Optimisation for Non-Convex Problems
signSGD: Compressed Optimisation for Non-Convex Problems
Jeremy Bernstein
Yu Wang
Kamyar Azizzadenesheli
Anima Anandkumar
FedMLODL
833
1,230
0
13 Feb 2018
MXNET-MPI: Embedding MPI parallelism in Parameter Server Task Model for
  scaling Deep Learning
MXNET-MPI: Embedding MPI parallelism in Parameter Server Task Model for scaling Deep Learning
Amith R. Mamidala
Georgios Kollias
C. Ward
F. Artico
173
21
0
11 Jan 2018
AdaComp : Adaptive Residual Gradient Compression for Data-Parallel
  Distributed Training
AdaComp : Adaptive Residual Gradient Compression for Data-Parallel Distributed Training
Chia-Yu Chen
Jungwook Choi
D. Brand
A. Agrawal
Wei Zhang
K. Gopalakrishnan
ODL
233
186
0
07 Dec 2017
Deep Gradient Compression: Reducing the Communication Bandwidth for
  Distributed Training
Deep Gradient Compression: Reducing the Communication Bandwidth for Distributed Training
Chengyue Wu
Song Han
Huizi Mao
Yu Wang
W. Dally
717
1,608
0
05 Dec 2017
Extremely Large Minibatch SGD: Training ResNet-50 on ImageNet in 15
  Minutes
Extremely Large Minibatch SGD: Training ResNet-50 on ImageNet in 15 Minutes
Takuya Akiba
Shuji Suzuki
Keisuke Fukuda
VLM
221
319
0
12 Nov 2017
Scale out for large minibatch SGD: Residual network training on
  ImageNet-1K with improved accuracy and reduced time to train
Scale out for large minibatch SGD: Residual network training on ImageNet-1K with improved accuracy and reduced time to train
V. Codreanu
Damian Podareanu
V. Saletore
253
57
0
12 Nov 2017
Don't Decay the Learning Rate, Increase the Batch Size
Don't Decay the Learning Rate, Increase the Batch Size
Samuel L. Smith
Pieter-Jan Kindermans
Chris Ying
Quoc V. Le
ODL
904
1,107
0
01 Nov 2017
Gradient Sparsification for Communication-Efficient Distributed
  Optimization
Gradient Sparsification for Communication-Efficient Distributed OptimizationNeural Information Processing Systems (NeurIPS), 2017
Jianqiao Wangni
Jialei Wang
Ji Liu
Tong Zhang
396
583
0
26 Oct 2017
PowerAI DDL
PowerAI DDL
Minsik Cho
Ulrich Finkler
Sameer Kumar
David S. Kung
Vaibhav Saxena
D. Sreedhar
AI4CE
151
47
0
07 Aug 2017
On the convergence properties of a $K$-step averaging stochastic
  gradient descent algorithm for nonconvex optimization
On the convergence properties of a KKK-step averaging stochastic gradient descent algorithm for nonconvex optimization
Fan Zhou
Guojing Cong
497
248
0
03 Aug 2017
Optimized Broadcast for Deep Learning Workloads on Dense-GPU InfiniBand
  Clusters: MPI or NCCL?
Optimized Broadcast for Deep Learning Workloads on Dense-GPU InfiniBand Clusters: MPI or NCCL?
A. A. Awan
Ching-Hsiang Chu
Hari Subramoni
D. Panda
GNN
199
51
0
28 Jul 2017
Poseidon: An Efficient Communication Architecture for Distributed Deep
  Learning on GPU Clusters
Poseidon: An Efficient Communication Architecture for Distributed Deep Learning on GPU ClustersUSENIX Annual Technical Conference (USENIX ATC), 2017
Huatian Zhang
Zeyu Zheng
Shizhen Xu
Wei-Ming Dai
Qirong Ho
Xiaodan Liang
Zhiting Hu
Jinliang Wei
P. Xie
Eric Xing
GNN
249
374
0
11 Jun 2017
Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour
Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour
Priya Goyal
Piotr Dollár
Ross B. Girshick
P. Noordhuis
Lukasz Wesolowski
Aapo Kyrola
Andrew Tulloch
Yangqing Jia
Kaiming He
3DH
800
4,041
0
08 Jun 2017
Can Decentralized Algorithms Outperform Centralized Algorithms? A Case
  Study for Decentralized Parallel Stochastic Gradient Descent
Can Decentralized Algorithms Outperform Centralized Algorithms? A Case Study for Decentralized Parallel Stochastic Gradient Descent
Xiangru Lian
Ce Zhang
Huan Zhang
Cho-Jui Hsieh
Wei Zhang
Ji Liu
725
1,418
0
25 May 2017
TernGrad: Ternary Gradients to Reduce Communication in Distributed Deep
  Learning
TernGrad: Ternary Gradients to Reduce Communication in Distributed Deep Learning
W. Wen
Cong Xu
Feng Yan
Chunpeng Wu
Yandan Wang
Yiran Chen
Hai Helen Li
632
1,055
0
22 May 2017
12
Next
Page 1 of 2