Efficient Interactive LLM Serving with Proxy Model-based Sequence Length Prediction

12 April 2024

Papers citing "Efficient Interactive LLM Serving with Proxy Model-based Sequence Length Prediction"

4 / 4 papers shown

Title
Taming the Titans: A Survey of Efficient LLM Inference Serving Ranran Zhen J. Li Yixin Ji Z. Yang Tong Liu Qingrong Xia Xinyu Duan Z. Wang Baoxing Huai M. Zhang LLMAG 77 0 0 28 Apr 2025
Tempo: Application-aware LLM Serving with Mixed SLO Requirements Wei Zhang Zhiyu Wu Yi Mu Banruo Liu Myungjin Lee Fan Lai 51 0 0 24 Apr 2025
Concise Thoughts: Impact of Output Length on LLM Reasoning and Cost Sania Nayab Giulio Rossolini Giorgio Buttazzo Nicolamaria Manes F. Giacomelli Nicolamaria Manes Fabrizio Giacomelli LRM 38 22 0 29 Jul 2024
Break the Sequential Dependency of LLM Inference Using Lookahead Decoding Yichao Fu Peter Bailis Ion Stoica Hao Zhang 123 134 0 03 Feb 2024