ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2503.14781
43
0

Fake Runs, Real Fixes -- Analyzing xPU Performance Through Simulation

18 March 2025
Ioannis Zarkadas
Amanda Tomlinson
Asaf Cidon
Baris Kasikci
Ofir Weisse
ArXivPDFHTML
Abstract

As models become larger, ML accelerators are a scarce resource whose performance must be continually optimized to improve efficiency. Existing performance analysis tools are coarse grained, and fail to capture model performance at the machine-code level. In addition, these tools often do not provide specific recommendations for optimizations. We present xPU-Shark, a fine-grained methodology for analyzing ML models at the machine-code level that provides actionable optimization suggestions. Our core insight is to use a hardware-level simulator, an artifact of the hardware design process that we can re-purpose for performance analysis. xPU-Shark captures traces from production deployments running on accelerators and replays them in a modified microarchitecture simulator to gain low-level insights into the model's performance. We implement xPU-Shark for our in-house accelerator and used it to analyze the performance of several of our production LLMs, revealing several previously-unknown microarchitecture inefficiencies. Leveraging these insights, we optimize a common communication collective by up to 15% and reduce token generation latency by up to 4.1%.

View on arXiv
@article{zarkadas2025_2503.14781,
  title={ Fake Runs, Real Fixes -- Analyzing xPU Performance Through Simulation },
  author={ Ioannis Zarkadas and Amanda Tomlinson and Asaf Cidon and Baris Kasikci and Ofir Weisse },
  journal={arXiv preprint arXiv:2503.14781},
  year={ 2025 }
}
Comments on this paper