v1v2 (latest)

Communication optimization strategies for distributed deep neural network training: A survey

6 March 2020

Papers citing "Communication optimization strategies for distributed deep neural network training: A survey"

50 / 67 papers shown

Fair and Efficient Distributed Edge Learning with Hybrid Multipath TCPIEEE/ACM Transactions on Networking (TON), 2022

Mengyue Deng

Jinho Choi

A. Walid

227

03 Nov 2022

HPSGD: Hierarchical Parallel SGD With Stale Gradients Featuring

276

06 Sep 2020

DBS: Dynamic Batch Size For Distributed Deep Neural Network Training

265

23 Jul 2020

Enabling Compute-Communication Overlap in Distributed Deep Learning Training Platforms

547

30 Jun 2020

Local SGD with Periodic Averaging: Tighter Analysis and Adaptive SynchronizationNeural Information Processing Systems (NeurIPS), 2019

Farzin Haddadpour

Mohammad Mahdi Kamani

M. Mahdavi

V. Cadambe

FedML

433

222

30 Oct 2019

Faster Distributed Deep Net Training: Computation and Communication Decoupled Stochastic Gradient DescentInternational Joint Conference on Artificial Intelligence (IJCAI), 2019

245

28 Jun 2019

PowerSGD: Practical Low-Rank Gradient Compression for Distributed OptimizationNeural Information Processing Systems (NeurIPS), 2019

Thijs Vogels

Sai Praneeth Karimireddy

Martin Jaggi

549

406

31 May 2019

Priority-based Parameter Propagation for Distributed DNN TrainingUSENIX workshop on Tackling computer systems problems with machine learning techniques (SysML), 2019

Anand Jayarajan

250

193

10 May 2019

Large Batch Optimization for Deep Learning: Training BERT in 76 minutes

Yang You

Jing Li

Sashank J. Reddi

Jonathan Hseu

Sanjiv Kumar

Srinadh Bhojanapalli

1.1K

1,157

01 Apr 2019

Yet Another Accelerated SGD: ResNet-50 Training on ImageNet in 74.7 seconds

173

29 Mar 2019

Scalable Deep Learning on Distributed Infrastructures: Challenges, Techniques and Tools

R. Mayer

Hans-Arno Jacobsen

GNN

430

223

27 Mar 2019

Error Feedback Fixes SignSGD and other Gradient Compression Schemes

Sai Praneeth Karimireddy

Quentin Rebjock

Sebastian U. Stich

Martin Jaggi

687

614

28 Jan 2019

Large-Batch Training for LSTM and Beyond

Yang You

298

24 Jan 2019

MG-WFBP: Efficient Data Communication for Distributed Synchronous SGD Algorithms

193

103

27 Nov 2018

Image Classification at Supercomputer Scale

228

128

16 Nov 2018

Massively Distributed SGD: ImageNet/ResNet-50 Training in a Flash

Hiroaki Mikami

Hisahiro Suganuma

Pongsakorn U-chupala

Yoshiki Tanaka

Yuichi Kageyama

228

13 Nov 2018

GradiVeQ: Vector Quantization for Bandwidth-Efficient Gradient Aggregation in Distributed CNN TrainingNeural Information Processing Systems (NeurIPS), 2018

197

08 Nov 2018

A Hitchhiker's Guide On Distributed Training of Deep Neural Networks

263

28 Oct 2018

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

3.1K

112,756

11 Oct 2018

The Convergence of Sparsified Gradient MethodsNeural Information Processing Systems (NeurIPS), 2018

Dan Alistarh

406

540

27 Sep 2018

Sparsified SGD with Memory

Sebastian U. Stich

Jean-Baptiste Cordonnier

Martin Jaggi

477

858

20 Sep 2018

RedSync : Reducing Synchronization Traffic for Distributed Deep Learning

349

13 Aug 2018

Highly Scalable Deep Learning Training System with Mixed-Precision: Training ImageNet in Four Minutes

...

263

420

30 Jul 2018

Parallel Restarted SGD with Faster Convergence and Less Communication: Demystifying Why Model Averaging Works for Deep LearningAAAI Conference on Artificial Intelligence (AAAI), 2018

613

672

17 Jul 2018

Error Compensated Quantized SGD and its Applications to Large-scale Distributed Optimization

Jiaxiang Wu

Weidong Huang

Junzhou Huang

Tong Zhang

304

250

21 Jun 2018

ATOMO: Communication-efficient Learning via Atomic Sparsification

Dimitris Papailiopoulos

432

382

11 Jun 2018

Double Quantization for Communication-Efficient Distributed Optimization

Yue Yu

Jiaxiang Wu

Longbo Huang

464

25 May 2018

LAG: Lazily Aggregated Gradient for Communication-Efficient Distributed Learning

Tianyi Chen

G. Giannakis

Tao Sun

W. Yin

373

321

25 May 2018

Local SGD Converges Fast and Communicates Little

Sebastian U. Stich

FedML

1.4K

1,227

24 May 2018

Sparse Binary Compression: Towards Distributed Deep Learning with minimal Communication

224

237

22 May 2018

TicTac: Accelerating Distributed Deep Learning with Communication Scheduling

Sayed Hadi Hashemi

Sangeetha Abdu Jyothi

R. Campbell

269

207

08 Mar 2018

Demystifying Parallel and Distributed Deep Learning: An In-Depth Concurrency AnalysisACM Computing Surveys (CSUR), 2018

Tal Ben-Nun

Torsten Hoefler

GNN

469

781

26 Feb 2018

SparCML: High-Performance Sparse Communication for Machine Learning

Dan Alistarh

370

145

22 Feb 2018

3LC: Lightweight and Effective Traffic Compression for Distributed Machine Learning

Hyeontaek Lim

D. Andersen

M. Kaminsky

369

21 Feb 2018

Horovod: fast and easy distributed deep learning in TensorFlow

Alexander Sergeev

Mike Del Balso

467

1,357

15 Feb 2018

signSGD: Compressed Optimisation for Non-Convex Problems

Jeremy Bernstein

Yu Wang

Kamyar Azizzadenesheli

Anima Anandkumar

FedML ODL

833

1,230

13 Feb 2018

MXNET-MPI: Embedding MPI parallelism in Parameter Server Task Model for scaling Deep Learning

173

11 Jan 2018

AdaComp : Adaptive Residual Gradient Compression for Data-Parallel Distributed Training

233

186

07 Dec 2017

Deep Gradient Compression: Reducing the Communication Bandwidth for Distributed Training

Song Han

717

1,608

05 Dec 2017

Extremely Large Minibatch SGD: Training ResNet-50 on ImageNet in 15 Minutes

221

319

12 Nov 2017

Scale out for large minibatch SGD: Residual network training on ImageNet-1K with improved accuracy and reduced time to train

V. Codreanu

Damian Podareanu

V. Saletore

253

12 Nov 2017

Don't Decay the Learning Rate, Increase the Batch Size

Samuel L. Smith

Pieter-Jan Kindermans

Chris Ying

Quoc V. Le

ODL

904

1,107

01 Nov 2017

Gradient Sparsification for Communication-Efficient Distributed OptimizationNeural Information Processing Systems (NeurIPS), 2017

Jianqiao Wangni

Jialei Wang

Ji Liu

Tong Zhang

396

583

26 Oct 2017

Minsik Cho

151

07 Aug 2017

On the convergence properties of a

K

-step averaging stochastic gradient descent algorithm for nonconvex optimization

Fan Zhou

Guojing Cong

497

248

03 Aug 2017

Optimized Broadcast for Deep Learning Workloads on Dense-GPU InfiniBand Clusters: MPI or NCCL?

Hari Subramoni

199

28 Jul 2017

Poseidon: An Efficient Communication Architecture for Distributed Deep Learning on GPU ClustersUSENIX Annual Technical Conference (USENIX ATC), 2017

Xiaodan Liang

249

374

11 Jun 2017

Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour

Piotr Dollár

800

4,041

08 Jun 2017

Can Decentralized Algorithms Outperform Centralized Algorithms? A Case Study for Decentralized Parallel Stochastic Gradient Descent

725

1,418

25 May 2017

TernGrad: Ternary Gradients to Reduce Communication in Distributed Deep Learning

Yiran Chen

632

1,055

22 May 2017