ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2303.14006
  4. Cited By
ASTRA-sim2.0: Modeling Hierarchical Networks and Disaggregated Systems
  for Large-model Training at Scale

ASTRA-sim2.0: Modeling Hierarchical Networks and Disaggregated Systems for Large-model Training at Scale

24 March 2023
William Won
Taekyung Heo
Saeed Rashidi
Srinivas Sridharan
S. Srinivasan
T. Krishna
ArXivPDFHTML

Papers citing "ASTRA-sim2.0: Modeling Hierarchical Networks and Disaggregated Systems for Large-model Training at Scale"

6 / 6 papers shown
Title
Phantora: Live GPU Cluster Simulation for Machine Learning System Performance Estimation
Phantora: Live GPU Cluster Simulation for Machine Learning System Performance Estimation
Jianxing Qin
Jingrong Chen
Xinhao Kong
Yongji Wu
Liang Luo
Z. Wang
Ying Zhang
Tingjun Chen
Alvin R. Lebeck
Danyang Zhuo
36
0
0
02 May 2025
Towards Easy and Realistic Network Infrastructure Testing for Large-scale Machine Learning
Towards Easy and Realistic Network Infrastructure Testing for Large-scale Machine Learning
Jinsun Yoo
ChonLam Lao
Lianjie Cao
Bob Lantz
Minlan Yu
Tushar Krishna
Puneet Sharma
47
0
0
29 Apr 2025
LLMServingSim: A HW/SW Co-Simulation Infrastructure for LLM Inference
  Serving at Scale
LLMServingSim: A HW/SW Co-Simulation Infrastructure for LLM Inference Serving at Scale
Jaehong Cho
Minsu Kim
Hyunmin Choi
Guseul Heo
Jongse Park
35
8
0
10 Aug 2024
Clio: A Hardware-Software Co-Designed Disaggregated Memory System
Clio: A Hardware-Software Co-Designed Disaggregated Memory System
Zhiyuan Guo
Yizhou Shan
Xuhao Luo
Yutong Huang
Yiying Zhang
GNN
27
128
0
07 Aug 2021
ZeRO-Offload: Democratizing Billion-Scale Model Training
ZeRO-Offload: Democratizing Billion-Scale Model Training
Jie Ren
Samyam Rajbhandari
Reza Yazdani Aminabadi
Olatunji Ruwase
Shuangyang Yang
Minjia Zhang
Dong Li
Yuxiong He
MoE
160
399
0
18 Jan 2021
Megatron-LM: Training Multi-Billion Parameter Language Models Using
  Model Parallelism
Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism
M. Shoeybi
M. Patwary
Raul Puri
P. LeGresley
Jared Casper
Bryan Catanzaro
MoE
243
1,791
0
17 Sep 2019
1