Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1707.09414
Cited By
Optimized Broadcast for Deep Learning Workloads on Dense-GPU InfiniBand Clusters: MPI or NCCL?
28 July 2017
A. A. Awan
Ching-Hsiang Chu
Hari Subramoni
D. Panda
GNN
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Optimized Broadcast for Deep Learning Workloads on Dense-GPU InfiniBand Clusters: MPI or NCCL?"
11 / 11 papers shown
Title
LUT-GEMM: Quantized Matrix Multiplication based on LUTs for Efficient Inference in Large-Scale Generative Language Models
Gunho Park
Baeseong Park
Minsub Kim
Sungjae Lee
Jeonghoon Kim
Beomseok Kwon
S. Kwon
Byeongwook Kim
Youngjoo Lee
Dongsoo Lee
MQ
13
73
0
20 Jun 2022
From Distributed Machine Learning to Federated Learning: A Survey
Ji Liu
Jizhou Huang
Yang Zhou
Xuhong Li
Shilei Ji
Haoyi Xiong
Dejing Dou
FedML
OOD
44
243
0
29 Apr 2021
Breaking (Global) Barriers in Parallel Stochastic Optimization with Wait-Avoiding Group Averaging
Shigang Li
Tal Ben-Nun
Giorgi Nadiradze
Salvatore Di Girolamo
Nikoli Dryden
Dan Alistarh
Torsten Hoefler
13
14
0
30 Apr 2020
Communication optimization strategies for distributed deep neural network training: A survey
Shuo Ouyang
Dezun Dong
Yemao Xu
Liquan Xiao
9
12
0
06 Mar 2020
PowerSGD: Practical Low-Rank Gradient Compression for Distributed Optimization
Thijs Vogels
Sai Praneeth Karimireddy
Martin Jaggi
11
316
0
31 May 2019
Priority-based Parameter Propagation for Distributed DNN Training
Anand Jayarajan
Jinliang Wei
Garth A. Gibson
Alexandra Fedorova
Gennady Pekhimenko
AI4CE
9
178
0
10 May 2019
Evaluating Modern GPU Interconnect: PCIe, NVLink, NV-SLI, NVSwitch and GPUDirect
Ang Li
S. Song
Jieyang Chen
Jiajia Li
Xu Liu
Nathan R. Tallent
Kevin J. Barker
GNN
27
210
0
11 Mar 2019
Accelerated Training for CNN Distributed Deep Learning through Automatic Resource-Aware Layer Placement
Jay H. Park
Sunghwan Kim
Jinwon Lee
Myeongjae Jeon
S. Noh
17
11
0
17 Jan 2019
An Empirical Evaluation of Allgatherv on Multi-GPU Systems
Thomas B. Rolinger
T. Simon
Christopher D. Krieger
12
2
0
14 Dec 2018
A DAG Model of Synchronous Stochastic Gradient Descent in Distributed Deep Learning
S. Shi
Qiang-qiang Wang
Xiaowen Chu
Bo-wen Li
FedML
GNN
9
23
0
10 May 2018
Performance Modeling and Evaluation of Distributed Deep Learning Frameworks on GPUs
S. Shi
Qiang-qiang Wang
Xiaowen Chu
22
110
0
16 Nov 2017
1