An In-Depth Analysis of the Slingshot Interconnect

20 August 2020

Daniele De Sensi

Salvatore Di Girolamo

Papers citing "An In-Depth Analysis of the Slingshot Interconnect"

39 / 39 papers shown

Title
Insights into DeepSeek-V3: Scaling Challenges and Reflections on Hardware for AI Architectures Chenggang Zhao Chengqi Deng Chong Ruan Damai Dai Huazuo Gao ... Wenfeng Liang Ying He Yun Wang Yuxuan Liu Y. X. Wei MoE 72 1 0 14 May 2025
Towards Easy and Realistic Network Infrastructure Testing for Large-scale Machine Learning Jinsun Yoo ChonLam Lao Lianjie Cao Bob Lantz Minlan Yu Tushar Krishna Puneet Sharma 88 0 0 29 Apr 2025
GPU-centric Communication Schemes for HPC and ML Applications Naveen Namashivayam GNN 77 0 0 31 Mar 2025
Scaling Large Language Model Training on Frontier with Low-Bandwidth Partitioning Lang Xu Quentin G. Anthony Jacob Hatef Hari Subramoni Hari Subramoni Dhabaleswar K. Panda 103 0 0 08 Jan 2025
A Digital Twin Framework for Liquid-cooled Supercomputers as Demonstrated at Exascale Wesley Brewer Matthias Maiterth Vineet Kumar Rafal Wojda Sedrick Bouknight ... Woong Shin Scott Greenwood David Grant Wesley Williams Feiyi Wang ELM 3DGS 58 5 0 07 Oct 2024
Performance and scaling of the LFRic weather and climate model on different generations of HPE Cray EX supercomputers J. Mark Bull Andrew Coughtrie Deva Deeptimahanti Mark Hedley Caoimhín Laoide-Kemp C. Maynard Harry Shepherd Sebastiaan van de Bund Michèle Weiland Benjamin Went 52 0 0 24 Sep 2024
Exploring GPU-to-GPU Communication: Insights into Supercomputer Interconnects Daniele De Sensi Lorenzo Pichetti Flavio Vella T. De Matteis Zebin Ren ... Animesh Trivedi Duncan Roweth Filippo Spiga Salvatore Di Girolamo Torsten Hoefler GNN 55 7 0 26 Aug 2024
Network-Offloaded Bandwidth-Optimal Broadcast and Allgather for Distributed AI Mikhail Khalilov Salvatore Di Girolamo Marcin Chrapek Rami Nudelman Gil Bloch Torsten Hoefler FedML 28 5 0 23 Aug 2024
Understanding Data Movement in Tightly Coupled Heterogeneous Systems: A Case Study with the Grace Hopper Superchip Luigi Fusco Mikhail Khalilov Marcin Chrapek G. Chukkapalli T. Schulthess Torsten Hoefler GNN 56 8 0 21 Aug 2024
UNR: Unified Notifiable RMA Library for HPC Guangnan Feng Jiabin Xie Dezun Dong Yutong Lu 35 1 0 14 Aug 2024
HiCCL: A Hierarchical Collective Communication Library Mert Hidayetoğlu Simon Garcia De Gonzalo Elliott Slaughter Pinku Surana Wen-mei W. Hwu William Gropp Alex Aiken 39 2 0 12 Aug 2024
Enabling Message Passing Interface Containers on the LUMI Supercomputer Alfio Lazzaro 45 0 0 29 Jul 2024
Accelerating Communication in Deep Learning Recommendation Model Training with Dual-Level Adaptive Lossy Compression Hao Feng Boyuan Zhang Fanjiang Ye Min Si Ching-Hsiang Chu ... Summer Deng Yuchen Hao Pavan Balaji Tong Geng Dingwen Tao AI4CE 80 2 0 05 Jul 2024
Supercomputers as a Continous Medium Martin Karp Niclas Jansson P. Schlatter Stefano Markidis 39 0 0 09 May 2024
Near-Optimal Wafer-Scale Reduce Piotr Luczynski Lukas Gianinazzi Patrick Iff Leighton Wilson Daniele De Sensi Torsten Hoefler 123 5 0 24 Apr 2024
Q-adaptive: A Multi-Agent Reinforcement Learning Based Routing on Dragonfly Network Yao Kang Xin Wang Z. Lan 69 15 0 24 Mar 2024
Study of Workload Interference with Intelligent Routing on Dragonfly Yao Kang Xin Wang Z. Lan 63 10 0 24 Mar 2024
Swing: Short-cutting Rings for Higher Bandwidth Allreduce Daniele De Sensi Tommaso Bonato D. Saam Torsten Hoefler 72 9 0 17 Jan 2024
Canary: Congestion-Aware In-Network Allreduce Using Dynamic Trees Daniele De Sensi Edgar Costa Molero Salvatore Di Girolamo Laurent Vanbever Torsten Hoefler 56 4 0 28 Sep 2023
Implementation-Oblivious Transparent Checkpoint-Restart for MPI Yao Xu Leonid Belyaev Twinkle Jain Derek Schafer Anthony Skjellum Gene Cooperman VLM 36 3 0 26 Sep 2023
gZCCL: Compression-Accelerated Collective Communication Framework for GPU Clusters Jiajun Huang Sheng Di Xiaodong Yu Yujia Zhai Jinyang Liu ... Xiaoyi Lu Zizhong Chen Franck Cappello Yan-Hua Guo R. Thakur 63 9 0 09 Aug 2023
The ExaNeSt Prototype: Evaluation of Efficient HPC Communication Hardware in an ARM-based Multi-FPGA Rack Manolis Ploumidis Fabien Chaix Nikolaos Chrysos Marios Assiminakis Vassilis Flouris ... Theocharis Vavouris M. Katevenis Vassilis D. Papaefstathiou M. Marazakis I. Mavroidis 133 1 0 18 Jul 2023
Exploring Fully Offloaded GPU Stream-Aware Message Passing N. Namashivayam K. Kandalla J. White L. Kaplan M. Pagel 28 3 0 27 Jun 2023
Evaluating the Potential of Disaggregated Memory Systems for HPC applications Nan Ding Pieter Maris H. Nam Taylor L. Groves M. Awan ... C. Daley Oguz Selvitopi L. Oliker N. Wright Samuel Williams 42 5 0 06 Jun 2023
Datacenter Ethernet and RDMA: Issues at Hyperscale Torsten Hoefler Duncan Roweth K. Underwood Bob Alverson Mark Griswold ... Surendra Anubolu Siyan Shen A. Kabbani M. McLaren Steve Scott 65 9 0 07 Feb 2023
Efficient Intra-Rack Resource Disaggregation for HPC Using Co-Packaged DWDM Photonics George Michelogiannakis Yehia Arafa B. Cook Liang Yuan Dai Abdel-Hameed A. Badawy Madeleine Glick Yuyang Wang Keren Bergman J. Shalf 45 9 0 09 Jan 2023
Efficient RDMA Communication Protocols Konstantin Taranov Fabian Fischer Torsten Hoefler 39 2 0 18 Dec 2022
Noise in the Clouds: Influence of Network Performance Variability on Application Scalability Daniele De Sensi T. De Matteis Konstantin Taranov Salvatore Di Girolamo Tobias Rahn Torsten Hoefler 61 12 0 27 Oct 2022
HammingMesh: A Network Topology for Large-Scale Deep Learning Torsten Hoefler Tommaso Bonato Daniele De Sensi Salvatore Di Girolamo Shigang Li Marco Heddes Jon Belk Deepak Goel Miguel Castro Steve Scott 3DH GNN AI4CE 79 23 0 03 Sep 2022
Exploring GPU Stream-Aware Message Passing using Triggered Operations N. Namashivayam K. Kandalla Trey White N. Radcliffe L. Kaplan M. Pagel GNN 45 13 0 09 Aug 2022
Near-Optimal Sparse Allreduce for Distributed Deep Learning Shigang Li Torsten Hoefler 62 53 0 19 Jan 2022
Chimera: Efficiently Training Large-Scale Neural Networks with Bidirectional Pipelines Shigang Li Torsten Hoefler GNN AI4CE LRM 130 137 0 14 Jul 2021
Flare: Flexible In-Network Allreduce Daniele De Sensi Salvatore Di Girolamo Saleh Ashkboos Shigang Li Torsten Hoefler 73 42 0 29 Jun 2021
Application-aware Congestion Mitigation for High-Performance Computing Systems Archit Patke Saurabh Jha Haoran Qiu J. Brandt A. Gentile Joe Greenseid Zbigniew T. Kalbarczyk Ravishankar Iyer 48 1 0 14 Dec 2020
PsPIN: A high-performance low-power architecture for flexible in-network compute Salvatore Di Girolamo Andreas Kurth A. Calotoiu Thomas Emanuel Benz Timo Schneider Jakub Beránek Luca Benini Torsten Hoefler 35 6 0 07 Oct 2020
Mitigating Network Noise on Dragonfly Networks through Application-Aware Routing Daniele De Sensi Salvatore Di Girolamo Torsten Hoefler 31 28 0 17 Sep 2019
Taming Unbalanced Training Workloads in Deep Learning with Partial Collective Operations Shigang Li Tal Ben-Nun Salvatore Di Girolamo Dan Alistarh Torsten Hoefler 147 59 0 12 Aug 2019
Deploying a Top-100 Supercomputer for Large Parallel Workloads: the Niagara Supercomputer Marcelo Ponce R. van Zon Scott A. Northrup D. Gruner Joseph M. Chen ... M. Saldarriaga Vladimir Slavnic Erik Spence Ching-Hsing Yu W. Peltier 84 166 0 31 Jul 2019
A Modular Benchmarking Infrastructure for High-Performance and Reproducible Deep Learning Tal Ben-Nun Maciej Besta Simon Huber A. Ziogas D. Peter Torsten Hoefler ELM ALM 69 77 0 29 Jan 2019