Evaluating Modern GPU Interconnect: PCIe, NVLink, NV-SLI, NVSwitch and
GPUDirect

Evaluating Modern GPU Interconnect: PCIe, NVLink, NV-SLI, NVSwitch and GPUDirect

11 March 2019

Nathan R. Tallent

Kevin J. Barker

Papers citing "Evaluating Modern GPU Interconnect: PCIe, NVLink, NV-SLI, NVSwitch and GPUDirect"

14 / 14 papers shown

Title
Accelerating Mixed-Precision Out-of-Core Cholesky Factorization with Static Task Scheduling Jie Ren Hatem Ltaief Sameh Abdulah David E. Keyes LRM 11 2 0 13 Oct 2024
FRED: Flexible REduction-Distribution Interconnect and Communication Implementation for Wafer-Scale Distributed Training of DNN Models Saeed Rashidi William Won S. Srinivasan Puneet Gupta Tushar Krishna 21 0 0 28 Jun 2024
Hybrid-Parallel: Achieving High Performance and Energy Efficient Distributed Inference on Robots Zekai Sun Xiuxian Guan Junming Wang Haoze Song Yuhao Qing Tianxiang Shen Dong Huang Fangming Liu Heming Cui 32 0 0 29 May 2024
Federated Learning Priorities Under the European Union Artificial Intelligence Act Herbert Woisetschläger Alexander Erben Bill Marino Shiqiang Wang Nicholas D. Lane R. Mayer Hans-Arno Jacobsen 21 15 0 05 Feb 2024
Interconnect Bandwidth Heterogeneity on AMD MI250x and Infinity Fabric Carl Pearson GNN 17 8 0 28 Feb 2023
RAMP: A Flat Nanosecond Optical Network and MPI Operations for Distributed Deep Learning Systems Alessandro Ottino Joshua L. Benjamin G. Zervas 17 7 0 28 Nov 2022
DFX: A Low-latency Multi-FPGA Appliance for Accelerating Transformer-based Text Generation Seongmin Hong Seungjae Moon Junsoo Kim Sungjae Lee Minsub Kim Dongsoo Lee Joo-Young Kim 64 76 0 22 Sep 2022
Harmony: Overcoming the Hurdles of GPU Memory Capacity to Train Massive DNN Models on Commodity Servers Youjie Li Amar Phanishayee D. Murray Jakub Tarnawski N. Kim 6 19 0 02 Feb 2022
Monitoring Collective Communication Among GPUs Muhammet Abdullah Soytürk Palwisha Akhtar Erhan Tezcan D. Unat GNN 13 1 0 20 Oct 2021
Themis: A Network Bandwidth-Aware Collective Scheduling Policy for Distributed Training of DL Models Saeed Rashidi William Won S. Srinivasan Srinivas Sridharan T. Krishna GNN 17 29 0 09 Oct 2021
PatrickStar: Parallel Training of Pre-trained Models via Chunk-based Memory Management Jiarui Fang Zilin Zhu Shenggui Li Hui Su Yang Yu Jie Zhou Yang You VLM 23 24 0 12 Aug 2021
Scalable and accurate multi-GPU based image reconstruction of large-scale ptychography data Xiaodong Yu Viktor V. Nikitin Daniel J. Ching Selin S. Aslan D. Gursoy Tekin Bicer 8 19 0 14 Jun 2021
Synthesizing Optimal Collective Algorithms Zixian Cai Zhengyang Liu Saeed Maleki Madan Musuvathi Todd Mytkowicz Jacob Nelson Olli Saarikivi GNN 13 59 0 19 Aug 2020
Enabling Compute-Communication Overlap in Distributed Deep Learning Training Platforms Saeed Rashidi Matthew Denton Srinivas Sridharan S. Srinivasan Amoghavarsha Suresh Jade Nie T. Krishna 13 45 0 30 Jun 2020