Revisiting SLO and Goodput Metrics in LLM Serving

18 October 2024

Zhibin Wang

Xue Li

Papers citing "Revisiting SLO and Goodput Metrics in LLM Serving"

3 / 3 papers shown

Title
Faster MoE LLM Inference for Extremely Large Models Haoqi Yang Luohe Shi Qiwei Li Zuchao Li Ping Wang Bo Du Mengjia Shen Hai Zhao MoE 59 0 0 06 May 2025
Prism: Unleashing GPU Sharing for Cost-Efficient Multi-LLM Serving Shan Yu Jiarong Xing Yifan Qiao Mingyuan Ma Y. Li ... Shiyi Cao Ke Bao Ion Stoica Harry Xu Ying Sheng 21 0 0 06 May 2025
Tempo: Application-aware LLM Serving with Mixed SLO Requirements Wei Zhang Zhiyu Wu Yi Mu Banruo Liu Myungjin Lee Fan Lai 51 0 0 24 Apr 2025