Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1903.04611
Cited By
Evaluating Modern GPU Interconnect: PCIe, NVLink, NV-SLI, NVSwitch and GPUDirect
11 March 2019
Ang Li
S. Song
Jieyang Chen
Jiajia Li
Xu Liu
Nathan R. Tallent
Kevin J. Barker
GNN
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Evaluating Modern GPU Interconnect: PCIe, NVLink, NV-SLI, NVSwitch and GPUDirect"
14 / 14 papers shown
Title
Accelerating Mixed-Precision Out-of-Core Cholesky Factorization with Static Task Scheduling
Jie Ren
Hatem Ltaief
Sameh Abdulah
David E. Keyes
LRM
11
2
0
13 Oct 2024
FRED: Flexible REduction-Distribution Interconnect and Communication Implementation for Wafer-Scale Distributed Training of DNN Models
Saeed Rashidi
William Won
S. Srinivasan
Puneet Gupta
Tushar Krishna
21
0
0
28 Jun 2024
Hybrid-Parallel: Achieving High Performance and Energy Efficient Distributed Inference on Robots
Zekai Sun
Xiuxian Guan
Junming Wang
Haoze Song
Yuhao Qing
Tianxiang Shen
Dong Huang
Fangming Liu
Heming Cui
32
0
0
29 May 2024
Federated Learning Priorities Under the European Union Artificial Intelligence Act
Herbert Woisetschläger
Alexander Erben
Bill Marino
Shiqiang Wang
Nicholas D. Lane
R. Mayer
Hans-Arno Jacobsen
21
15
0
05 Feb 2024
Interconnect Bandwidth Heterogeneity on AMD MI250x and Infinity Fabric
Carl Pearson
GNN
17
8
0
28 Feb 2023
RAMP: A Flat Nanosecond Optical Network and MPI Operations for Distributed Deep Learning Systems
Alessandro Ottino
Joshua L. Benjamin
G. Zervas
17
7
0
28 Nov 2022
DFX: A Low-latency Multi-FPGA Appliance for Accelerating Transformer-based Text Generation
Seongmin Hong
Seungjae Moon
Junsoo Kim
Sungjae Lee
Minsub Kim
Dongsoo Lee
Joo-Young Kim
64
76
0
22 Sep 2022
Harmony: Overcoming the Hurdles of GPU Memory Capacity to Train Massive DNN Models on Commodity Servers
Youjie Li
Amar Phanishayee
D. Murray
Jakub Tarnawski
N. Kim
6
19
0
02 Feb 2022
Monitoring Collective Communication Among GPUs
Muhammet Abdullah Soytürk
Palwisha Akhtar
Erhan Tezcan
D. Unat
GNN
13
1
0
20 Oct 2021
Themis: A Network Bandwidth-Aware Collective Scheduling Policy for Distributed Training of DL Models
Saeed Rashidi
William Won
S. Srinivasan
Srinivas Sridharan
T. Krishna
GNN
17
29
0
09 Oct 2021
PatrickStar: Parallel Training of Pre-trained Models via Chunk-based Memory Management
Jiarui Fang
Zilin Zhu
Shenggui Li
Hui Su
Yang Yu
Jie Zhou
Yang You
VLM
23
24
0
12 Aug 2021
Scalable and accurate multi-GPU based image reconstruction of large-scale ptychography data
Xiaodong Yu
Viktor V. Nikitin
Daniel J. Ching
Selin S. Aslan
D. Gursoy
Tekin Bicer
8
19
0
14 Jun 2021
Synthesizing Optimal Collective Algorithms
Zixian Cai
Zhengyang Liu
Saeed Maleki
Madan Musuvathi
Todd Mytkowicz
Jacob Nelson
Olli Saarikivi
GNN
13
59
0
19 Aug 2020
Enabling Compute-Communication Overlap in Distributed Deep Learning Training Platforms
Saeed Rashidi
Matthew Denton
Srinivas Sridharan
S. Srinivasan
Amoghavarsha Suresh
Jade Nie
T. Krishna
13
45
0
30 Jun 2020
1