ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1903.04611
  4. Cited By
Evaluating Modern GPU Interconnect: PCIe, NVLink, NV-SLI, NVSwitch and
  GPUDirect

Evaluating Modern GPU Interconnect: PCIe, NVLink, NV-SLI, NVSwitch and GPUDirect

11 March 2019
Ang Li
S. Song
Jieyang Chen
Jiajia Li
Xu Liu
Nathan R. Tallent
Kevin J. Barker
    GNN
ArXivPDFHTML

Papers citing "Evaluating Modern GPU Interconnect: PCIe, NVLink, NV-SLI, NVSwitch and GPUDirect"

14 / 14 papers shown
Title
Accelerating Mixed-Precision Out-of-Core Cholesky Factorization with
  Static Task Scheduling
Accelerating Mixed-Precision Out-of-Core Cholesky Factorization with Static Task Scheduling
Jie Ren
Hatem Ltaief
Sameh Abdulah
David E. Keyes
LRM
11
2
0
13 Oct 2024
FRED: Flexible REduction-Distribution Interconnect and Communication
  Implementation for Wafer-Scale Distributed Training of DNN Models
FRED: Flexible REduction-Distribution Interconnect and Communication Implementation for Wafer-Scale Distributed Training of DNN Models
Saeed Rashidi
William Won
S. Srinivasan
Puneet Gupta
Tushar Krishna
21
0
0
28 Jun 2024
Hybrid-Parallel: Achieving High Performance and Energy Efficient
  Distributed Inference on Robots
Hybrid-Parallel: Achieving High Performance and Energy Efficient Distributed Inference on Robots
Zekai Sun
Xiuxian Guan
Junming Wang
Haoze Song
Yuhao Qing
Tianxiang Shen
Dong Huang
Fangming Liu
Heming Cui
32
0
0
29 May 2024
Federated Learning Priorities Under the European Union Artificial
  Intelligence Act
Federated Learning Priorities Under the European Union Artificial Intelligence Act
Herbert Woisetschläger
Alexander Erben
Bill Marino
Shiqiang Wang
Nicholas D. Lane
R. Mayer
Hans-Arno Jacobsen
21
15
0
05 Feb 2024
Interconnect Bandwidth Heterogeneity on AMD MI250x and Infinity Fabric
Interconnect Bandwidth Heterogeneity on AMD MI250x and Infinity Fabric
Carl Pearson
GNN
17
8
0
28 Feb 2023
RAMP: A Flat Nanosecond Optical Network and MPI Operations for
  Distributed Deep Learning Systems
RAMP: A Flat Nanosecond Optical Network and MPI Operations for Distributed Deep Learning Systems
Alessandro Ottino
Joshua L. Benjamin
G. Zervas
17
7
0
28 Nov 2022
DFX: A Low-latency Multi-FPGA Appliance for Accelerating
  Transformer-based Text Generation
DFX: A Low-latency Multi-FPGA Appliance for Accelerating Transformer-based Text Generation
Seongmin Hong
Seungjae Moon
Junsoo Kim
Sungjae Lee
Minsub Kim
Dongsoo Lee
Joo-Young Kim
64
76
0
22 Sep 2022
Harmony: Overcoming the Hurdles of GPU Memory Capacity to Train Massive
  DNN Models on Commodity Servers
Harmony: Overcoming the Hurdles of GPU Memory Capacity to Train Massive DNN Models on Commodity Servers
Youjie Li
Amar Phanishayee
D. Murray
Jakub Tarnawski
N. Kim
4
19
0
02 Feb 2022
Monitoring Collective Communication Among GPUs
Monitoring Collective Communication Among GPUs
Muhammet Abdullah Soytürk
Palwisha Akhtar
Erhan Tezcan
D. Unat
GNN
13
1
0
20 Oct 2021
Themis: A Network Bandwidth-Aware Collective Scheduling Policy for
  Distributed Training of DL Models
Themis: A Network Bandwidth-Aware Collective Scheduling Policy for Distributed Training of DL Models
Saeed Rashidi
William Won
S. Srinivasan
Srinivas Sridharan
T. Krishna
GNN
17
29
0
09 Oct 2021
PatrickStar: Parallel Training of Pre-trained Models via Chunk-based
  Memory Management
PatrickStar: Parallel Training of Pre-trained Models via Chunk-based Memory Management
Jiarui Fang
Zilin Zhu
Shenggui Li
Hui Su
Yang Yu
Jie Zhou
Yang You
VLM
21
24
0
12 Aug 2021
Scalable and accurate multi-GPU based image reconstruction of
  large-scale ptychography data
Scalable and accurate multi-GPU based image reconstruction of large-scale ptychography data
Xiaodong Yu
Viktor V. Nikitin
Daniel J. Ching
Selin S. Aslan
D. Gursoy
Tekin Bicer
8
19
0
14 Jun 2021
Synthesizing Optimal Collective Algorithms
Synthesizing Optimal Collective Algorithms
Zixian Cai
Zhengyang Liu
Saeed Maleki
Madan Musuvathi
Todd Mytkowicz
Jacob Nelson
Olli Saarikivi
GNN
13
59
0
19 Aug 2020
Enabling Compute-Communication Overlap in Distributed Deep Learning
  Training Platforms
Enabling Compute-Communication Overlap in Distributed Deep Learning Training Platforms
Saeed Rashidi
Matthew Denton
Srinivas Sridharan
S. Srinivasan
Amoghavarsha Suresh
Jade Nie
T. Krishna
13
45
0
30 Jun 2020
1