Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2008.08886
Cited By
An In-Depth Analysis of the Slingshot Interconnect
20 August 2020
Daniele De Sensi
Salvatore Di Girolamo
K. McMahon
Duncan Roweth
Torsten Hoefler
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"An In-Depth Analysis of the Slingshot Interconnect"
39 / 39 papers shown
Title
Insights into DeepSeek-V3: Scaling Challenges and Reflections on Hardware for AI Architectures
Chenggang Zhao
Chengqi Deng
Chong Ruan
Damai Dai
Huazuo Gao
...
Wenfeng Liang
Ying He
Yun Wang
Yuxuan Liu
Y. X. Wei
MoE
72
1
0
14 May 2025
Towards Easy and Realistic Network Infrastructure Testing for Large-scale Machine Learning
Jinsun Yoo
ChonLam Lao
Lianjie Cao
Bob Lantz
Minlan Yu
Tushar Krishna
Puneet Sharma
88
0
0
29 Apr 2025
GPU-centric Communication Schemes for HPC and ML Applications
Naveen Namashivayam
GNN
77
0
0
31 Mar 2025
Scaling Large Language Model Training on Frontier with Low-Bandwidth Partitioning
Lang Xu
Quentin G. Anthony
Jacob Hatef
Hari Subramoni
Hari Subramoni
Dhabaleswar K.
Panda
103
0
0
08 Jan 2025
A Digital Twin Framework for Liquid-cooled Supercomputers as Demonstrated at Exascale
Wesley Brewer
Matthias Maiterth
Vineet Kumar
Rafal Wojda
Sedrick Bouknight
...
Woong Shin
Scott Greenwood
David Grant
Wesley Williams
Feiyi Wang
ELM
3DGS
58
5
0
07 Oct 2024
Performance and scaling of the LFRic weather and climate model on different generations of HPE Cray EX supercomputers
J. Mark Bull
Andrew Coughtrie
Deva Deeptimahanti
Mark Hedley
Caoimhín Laoide-Kemp
C. Maynard
Harry Shepherd
Sebastiaan van de Bund
Michèle Weiland
Benjamin Went
52
0
0
24 Sep 2024
Exploring GPU-to-GPU Communication: Insights into Supercomputer Interconnects
Daniele De Sensi
Lorenzo Pichetti
Flavio Vella
T. De Matteis
Zebin Ren
...
Animesh Trivedi
Duncan Roweth
Filippo Spiga
Salvatore Di Girolamo
Torsten Hoefler
GNN
55
7
0
26 Aug 2024
Network-Offloaded Bandwidth-Optimal Broadcast and Allgather for Distributed AI
Mikhail Khalilov
Salvatore Di Girolamo
Marcin Chrapek
Rami Nudelman
Gil Bloch
Torsten Hoefler
FedML
28
5
0
23 Aug 2024
Understanding Data Movement in Tightly Coupled Heterogeneous Systems: A Case Study with the Grace Hopper Superchip
Luigi Fusco
Mikhail Khalilov
Marcin Chrapek
G. Chukkapalli
T. Schulthess
Torsten Hoefler
GNN
56
8
0
21 Aug 2024
UNR: Unified Notifiable RMA Library for HPC
Guangnan Feng
Jiabin Xie
Dezun Dong
Yutong Lu
35
1
0
14 Aug 2024
HiCCL: A Hierarchical Collective Communication Library
Mert Hidayetoğlu
Simon Garcia De Gonzalo
Elliott Slaughter
Pinku Surana
Wen-mei W. Hwu
William Gropp
Alex Aiken
39
2
0
12 Aug 2024
Enabling Message Passing Interface Containers on the LUMI Supercomputer
Alfio Lazzaro
45
0
0
29 Jul 2024
Accelerating Communication in Deep Learning Recommendation Model Training with Dual-Level Adaptive Lossy Compression
Hao Feng
Boyuan Zhang
Fanjiang Ye
Min Si
Ching-Hsiang Chu
...
Summer Deng
Yuchen Hao
Pavan Balaji
Tong Geng
Dingwen Tao
AI4CE
80
2
0
05 Jul 2024
Supercomputers as a Continous Medium
Martin Karp
Niclas Jansson
P. Schlatter
Stefano Markidis
39
0
0
09 May 2024
Near-Optimal Wafer-Scale Reduce
Piotr Luczynski
Lukas Gianinazzi
Patrick Iff
Leighton Wilson
Daniele De Sensi
Torsten Hoefler
123
5
0
24 Apr 2024
Q-adaptive: A Multi-Agent Reinforcement Learning Based Routing on Dragonfly Network
Yao Kang
Xin Wang
Z. Lan
69
15
0
24 Mar 2024
Study of Workload Interference with Intelligent Routing on Dragonfly
Yao Kang
Xin Wang
Z. Lan
63
10
0
24 Mar 2024
Swing: Short-cutting Rings for Higher Bandwidth Allreduce
Daniele De Sensi
Tommaso Bonato
D. Saam
Torsten Hoefler
72
9
0
17 Jan 2024
Canary: Congestion-Aware In-Network Allreduce Using Dynamic Trees
Daniele De Sensi
Edgar Costa Molero
Salvatore Di Girolamo
Laurent Vanbever
Torsten Hoefler
56
4
0
28 Sep 2023
Implementation-Oblivious Transparent Checkpoint-Restart for MPI
Yao Xu
Leonid Belyaev
Twinkle Jain
Derek Schafer
Anthony Skjellum
Gene Cooperman
VLM
36
3
0
26 Sep 2023
gZCCL: Compression-Accelerated Collective Communication Framework for GPU Clusters
Jiajun Huang
Sheng Di
Xiaodong Yu
Yujia Zhai
Jinyang Liu
...
Xiaoyi Lu
Zizhong Chen
Franck Cappello
Yan-Hua Guo
R. Thakur
63
9
0
09 Aug 2023
The ExaNeSt Prototype: Evaluation of Efficient HPC Communication Hardware in an ARM-based Multi-FPGA Rack
Manolis Ploumidis
Fabien Chaix
Nikolaos Chrysos
Marios Assiminakis
Vassilis Flouris
...
Theocharis Vavouris
M. Katevenis
Vassilis D. Papaefstathiou
M. Marazakis
I. Mavroidis
133
1
0
18 Jul 2023
Exploring Fully Offloaded GPU Stream-Aware Message Passing
N. Namashivayam
K. Kandalla
J. White
L. Kaplan
M. Pagel
28
3
0
27 Jun 2023
Evaluating the Potential of Disaggregated Memory Systems for HPC applications
Nan Ding
Pieter Maris
H. Nam
Taylor L. Groves
M. Awan
...
C. Daley
Oguz Selvitopi
L. Oliker
N. Wright
Samuel Williams
42
5
0
06 Jun 2023
Datacenter Ethernet and RDMA: Issues at Hyperscale
Torsten Hoefler
Duncan Roweth
K. Underwood
Bob Alverson
Mark Griswold
...
Surendra Anubolu
Siyan Shen
A. Kabbani
M. McLaren
Steve Scott
65
9
0
07 Feb 2023
Efficient Intra-Rack Resource Disaggregation for HPC Using Co-Packaged DWDM Photonics
George Michelogiannakis
Yehia Arafa
B. Cook
Liang Yuan Dai
Abdel-Hameed A. Badawy
Madeleine Glick
Yuyang Wang
Keren Bergman
J. Shalf
45
9
0
09 Jan 2023
Efficient RDMA Communication Protocols
Konstantin Taranov
Fabian Fischer
Torsten Hoefler
39
2
0
18 Dec 2022
Noise in the Clouds: Influence of Network Performance Variability on Application Scalability
Daniele De Sensi
T. De Matteis
Konstantin Taranov
Salvatore Di Girolamo
Tobias Rahn
Torsten Hoefler
61
12
0
27 Oct 2022
HammingMesh: A Network Topology for Large-Scale Deep Learning
Torsten Hoefler
Tommaso Bonato
Daniele De Sensi
Salvatore Di Girolamo
Shigang Li
Marco Heddes
Jon Belk
Deepak Goel
Miguel Castro
Steve Scott
3DH
GNN
AI4CE
79
23
0
03 Sep 2022
Exploring GPU Stream-Aware Message Passing using Triggered Operations
N. Namashivayam
K. Kandalla
Trey White
N. Radcliffe
L. Kaplan
M. Pagel
GNN
45
13
0
09 Aug 2022
Near-Optimal Sparse Allreduce for Distributed Deep Learning
Shigang Li
Torsten Hoefler
62
53
0
19 Jan 2022
Chimera: Efficiently Training Large-Scale Neural Networks with Bidirectional Pipelines
Shigang Li
Torsten Hoefler
GNN
AI4CE
LRM
130
137
0
14 Jul 2021
Flare: Flexible In-Network Allreduce
Daniele De Sensi
Salvatore Di Girolamo
Saleh Ashkboos
Shigang Li
Torsten Hoefler
73
42
0
29 Jun 2021
Application-aware Congestion Mitigation for High-Performance Computing Systems
Archit Patke
Saurabh Jha
Haoran Qiu
J. Brandt
A. Gentile
Joe Greenseid
Zbigniew T. Kalbarczyk
Ravishankar Iyer
48
1
0
14 Dec 2020
PsPIN: A high-performance low-power architecture for flexible in-network compute
Salvatore Di Girolamo
Andreas Kurth
A. Calotoiu
Thomas Emanuel Benz
Timo Schneider
Jakub Beránek
Luca Benini
Torsten Hoefler
35
6
0
07 Oct 2020
Mitigating Network Noise on Dragonfly Networks through Application-Aware Routing
Daniele De Sensi
Salvatore Di Girolamo
Torsten Hoefler
31
28
0
17 Sep 2019
Taming Unbalanced Training Workloads in Deep Learning with Partial Collective Operations
Shigang Li
Tal Ben-Nun
Salvatore Di Girolamo
Dan Alistarh
Torsten Hoefler
147
59
0
12 Aug 2019
Deploying a Top-100 Supercomputer for Large Parallel Workloads: the Niagara Supercomputer
Marcelo Ponce
R. van Zon
Scott A. Northrup
D. Gruner
Joseph M. Chen
...
M. Saldarriaga
Vladimir Slavnic
Erik Spence
Ching-Hsing Yu
W. Peltier
84
166
0
31 Jul 2019
A Modular Benchmarking Infrastructure for High-Performance and Reproducible Deep Learning
Tal Ben-Nun
Maciej Besta
Simon Huber
A. Ziogas
D. Peter
Torsten Hoefler
ELM
ALM
69
77
0
29 Jan 2019
1