Improving the End-to-End Efficiency of Offline Inference for Multi-LLM Applications Based on Sampling and Simulation

21 March 2025

Papers citing "Improving the End-to-End Efficiency of Offline Inference for Multi-LLM Applications Based on Sampling and Simulation"

1 / 1 papers shown

Title
Prism: Unleashing GPU Sharing for Cost-Efficient Multi-LLM Serving Shan Yu Jiarong Xing Yifan Qiao Mingyuan Ma Y. Li ... Shiyi Cao Ke Bao Ion Stoica Harry Xu Ying Sheng 19 0 0 06 May 2025