Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2403.19708
Cited By
Cost-Efficient Large Language Model Serving for Multi-turn Conversations with CachedAttention
23 March 2024
Bin Gao
Zhuomin He
Puru Sharma
Qingxuan Kang
Djordje Jevdjic
Junbo Deng
Xingkun Yang
Zhou Yu
Pengfei Zuo
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Cost-Efficient Large Language Model Serving for Multi-turn Conversations with CachedAttention"
10 / 10 papers shown
Title
Ascendra: Dynamic Request Prioritization for Efficient LLM Serving
Azam Ikram
Xiang Li
Sameh Elnikety
S. Bagchi
47
0
0
29 Apr 2025
Taming the Titans: A Survey of Efficient LLM Inference Serving
Ranran Zhen
J. Li
Yixin Ji
Z. Yang
Tong Liu
Qingrong Xia
Xinyu Duan
Z. Wang
Baoxing Huai
M. Zhang
LLMAG
77
0
0
28 Apr 2025
HyGen: Efficient LLM Serving via Elastic Online-Offline Request Co-location
Ting Sun
Penghan Wang
Fan Lai
43
1
0
15 Jan 2025
Marconi: Prefix Caching for the Era of Hybrid LLMs
Rui Pan
Zhuang Wang
Zhen Jia
Can Karakus
Luca Zancato
Tri Dao
Ravi Netravali
Yida Wang
84
4
0
28 Nov 2024
PowerInfer: Fast Large Language Model Serving with a Consumer-grade GPU
Yixin Song
Zeyu Mi
Haotong Xie
Haibo Chen
BDL
112
114
0
16 Dec 2023
LLM in a flash: Efficient Large Language Model Inference with Limited Memory
Keivan Alizadeh-Vahid
Iman Mirzadeh
Dmitry Belenko
Karen Khatamifard
Minsik Cho
C. C. D. Mundo
Mohammad Rastegari
Mehrdad Farajtabar
68
104
0
12 Dec 2023
ReLU Strikes Back: Exploiting Activation Sparsity in Large Language Models
Iman Mirzadeh
Keivan Alizadeh-Vahid
Sachin Mehta
C. C. D. Mundo
Oncel Tuzel
Golnoosh Samei
Mohammad Rastegari
Mehrdad Farajtabar
118
58
0
06 Oct 2023
MINT: Evaluating LLMs in Multi-turn Interaction with Tools and Language Feedback
Xingyao Wang
Zihan Wang
Jiateng Liu
Yangyi Chen
Lifan Yuan
Hao Peng
Heng Ji
LRM
120
137
0
19 Sep 2023
FlexGen: High-Throughput Generative Inference of Large Language Models with a Single GPU
Ying Sheng
Lianmin Zheng
Binhang Yuan
Zhuohan Li
Max Ryabinin
...
Joseph E. Gonzalez
Percy Liang
Christopher Ré
Ion Stoica
Ce Zhang
135
208
0
13 Mar 2023
Transformer in Transformer
Kai Han
An Xiao
Enhua Wu
Jianyuan Guo
Chunjing Xu
Yunhe Wang
ViT
276
1,490
0
27 Feb 2021
1