ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1707.09414
  4. Cited By
Optimized Broadcast for Deep Learning Workloads on Dense-GPU InfiniBand
  Clusters: MPI or NCCL?

Optimized Broadcast for Deep Learning Workloads on Dense-GPU InfiniBand Clusters: MPI or NCCL?

28 July 2017
A. A. Awan
Ching-Hsiang Chu
Hari Subramoni
D. Panda
    GNN
ArXivPDFHTML

Papers citing "Optimized Broadcast for Deep Learning Workloads on Dense-GPU InfiniBand Clusters: MPI or NCCL?"

11 / 11 papers shown
Title
LUT-GEMM: Quantized Matrix Multiplication based on LUTs for Efficient
  Inference in Large-Scale Generative Language Models
LUT-GEMM: Quantized Matrix Multiplication based on LUTs for Efficient Inference in Large-Scale Generative Language Models
Gunho Park
Baeseong Park
Minsub Kim
Sungjae Lee
Jeonghoon Kim
Beomseok Kwon
S. Kwon
Byeongwook Kim
Youngjoo Lee
Dongsoo Lee
MQ
13
73
0
20 Jun 2022
From Distributed Machine Learning to Federated Learning: A Survey
From Distributed Machine Learning to Federated Learning: A Survey
Ji Liu
Jizhou Huang
Yang Zhou
Xuhong Li
Shilei Ji
Haoyi Xiong
Dejing Dou
FedML
OOD
44
243
0
29 Apr 2021
Breaking (Global) Barriers in Parallel Stochastic Optimization with
  Wait-Avoiding Group Averaging
Breaking (Global) Barriers in Parallel Stochastic Optimization with Wait-Avoiding Group Averaging
Shigang Li
Tal Ben-Nun
Giorgi Nadiradze
Salvatore Di Girolamo
Nikoli Dryden
Dan Alistarh
Torsten Hoefler
13
14
0
30 Apr 2020
Communication optimization strategies for distributed deep neural
  network training: A survey
Communication optimization strategies for distributed deep neural network training: A survey
Shuo Ouyang
Dezun Dong
Yemao Xu
Liquan Xiao
17
12
0
06 Mar 2020
PowerSGD: Practical Low-Rank Gradient Compression for Distributed
  Optimization
PowerSGD: Practical Low-Rank Gradient Compression for Distributed Optimization
Thijs Vogels
Sai Praneeth Karimireddy
Martin Jaggi
17
316
0
31 May 2019
Priority-based Parameter Propagation for Distributed DNN Training
Priority-based Parameter Propagation for Distributed DNN Training
Anand Jayarajan
Jinliang Wei
Garth A. Gibson
Alexandra Fedorova
Gennady Pekhimenko
AI4CE
11
178
0
10 May 2019
Evaluating Modern GPU Interconnect: PCIe, NVLink, NV-SLI, NVSwitch and
  GPUDirect
Evaluating Modern GPU Interconnect: PCIe, NVLink, NV-SLI, NVSwitch and GPUDirect
Ang Li
S. Song
Jieyang Chen
Jiajia Li
Xu Liu
Nathan R. Tallent
Kevin J. Barker
GNN
35
210
0
11 Mar 2019
Accelerated Training for CNN Distributed Deep Learning through Automatic
  Resource-Aware Layer Placement
Accelerated Training for CNN Distributed Deep Learning through Automatic Resource-Aware Layer Placement
Jay H. Park
Sunghwan Kim
Jinwon Lee
Myeongjae Jeon
S. Noh
17
11
0
17 Jan 2019
An Empirical Evaluation of Allgatherv on Multi-GPU Systems
An Empirical Evaluation of Allgatherv on Multi-GPU Systems
Thomas B. Rolinger
T. Simon
Christopher D. Krieger
17
2
0
14 Dec 2018
A DAG Model of Synchronous Stochastic Gradient Descent in Distributed
  Deep Learning
A DAG Model of Synchronous Stochastic Gradient Descent in Distributed Deep Learning
S. Shi
Qiang-qiang Wang
Xiaowen Chu
Bo-wen Li
FedML
GNN
9
23
0
10 May 2018
Performance Modeling and Evaluation of Distributed Deep Learning
  Frameworks on GPUs
Performance Modeling and Evaluation of Distributed Deep Learning Frameworks on GPUs
S. Shi
Qiang-qiang Wang
Xiaowen Chu
27
110
0
16 Nov 2017
1