Phantora: Live GPU Cluster Simulation for Machine Learning System Performance Estimation

2 May 2025

Abstract

To accommodate ever-increasing model complexity, modern machine learning (ML) systems have to scale to large GPU clusters. Changes in ML model architecture, ML system implementation, and cluster configuration can significantly affect overall ML system performance. However, quantifying the performance impact before deployment is challenging. Existing performance estimation methods use performance modeling or static workload simulation. These techniques are not general: they requires significant human effort and computation capacity to generate training data or a workload. It is also difficult to adapt ML systems to use these techniques. This paper introduces, Phantora, a live GPU cluster simulator for performance estimation. Phantora runs minimally modified ML models and frameworks, intercepting and simulating GPU-related operations to enable high-fidelity performance estimation. Phantora overcomes several research challenges in integrating an event-driven network simulator with live system execution, and introduces a set of techniques to improve simulation speed, scalability, and accuracy. Our evaluation results show that Phantora can deliver similar estimation accuracy to the state-of-the-art workload simulation approach with only one GPU, while reducing human effort and increasing generalizability.

View on arXiv

@article{qin2025_2505.01616,
  title={ Phantora: Live GPU Cluster Simulation for Machine Learning System Performance Estimation },
  author={ Jianxing Qin and Jingrong Chen and Xinhao Kong and Yongji Wu and Liang Luo and Zhaodong Wang and Ying Zhang and Tingjun Chen and Alvin R. Lebeck and Danyang Zhuo },
  journal={arXiv preprint arXiv:2505.01616},
  year={ 2025 }
}

Comments on this paper