Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2303.14006
Cited By
ASTRA-sim2.0: Modeling Hierarchical Networks and Disaggregated Systems for Large-model Training at Scale
24 March 2023
William Won
Taekyung Heo
Saeed Rashidi
Srinivas Sridharan
S. Srinivasan
T. Krishna
Re-assign community
ArXiv
PDF
HTML
Papers citing
"ASTRA-sim2.0: Modeling Hierarchical Networks and Disaggregated Systems for Large-model Training at Scale"
6 / 6 papers shown
Title
Phantora: Live GPU Cluster Simulation for Machine Learning System Performance Estimation
Jianxing Qin
Jingrong Chen
Xinhao Kong
Yongji Wu
Liang Luo
Z. Wang
Ying Zhang
Tingjun Chen
Alvin R. Lebeck
Danyang Zhuo
36
0
0
02 May 2025
Towards Easy and Realistic Network Infrastructure Testing for Large-scale Machine Learning
Jinsun Yoo
ChonLam Lao
Lianjie Cao
Bob Lantz
Minlan Yu
Tushar Krishna
Puneet Sharma
47
0
0
29 Apr 2025
LLMServingSim: A HW/SW Co-Simulation Infrastructure for LLM Inference Serving at Scale
Jaehong Cho
Minsu Kim
Hyunmin Choi
Guseul Heo
Jongse Park
35
8
0
10 Aug 2024
Clio: A Hardware-Software Co-Designed Disaggregated Memory System
Zhiyuan Guo
Yizhou Shan
Xuhao Luo
Yutong Huang
Yiying Zhang
GNN
27
128
0
07 Aug 2021
ZeRO-Offload: Democratizing Billion-Scale Model Training
Jie Ren
Samyam Rajbhandari
Reza Yazdani Aminabadi
Olatunji Ruwase
Shuangyang Yang
Minjia Zhang
Dong Li
Yuxiong He
MoE
160
399
0
18 Jan 2021
Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism
M. Shoeybi
M. Patwary
Raul Puri
P. LeGresley
Jared Casper
Bryan Catanzaro
MoE
243
1,791
0
17 Sep 2019
1