Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2405.10637
Cited By
Layer-Condensed KV Cache for Efficient Inference of Large Language Models
17 May 2024
Haoyi Wu
Kewei Tu
MQ
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Layer-Condensed KV Cache for Efficient Inference of Large Language Models"
5 / 5 papers shown
Title
A Systematic Study of Cross-Layer KV Sharing for Efficient LLM Inference
You Wu
Haoyi Wu
Kewei Tu
32
3
0
18 Oct 2024
ThinK: Thinner Key Cache by Query-Driven Pruning
Yuhui Xu
Zhanming Jie
Hanze Dong
Lei Wang
Xudong Lu
Aojun Zhou
Amrita Saha
Caiming Xiong
Doyen Sahoo
58
14
0
30 Jul 2024
Probabilistic Transformer: A Probabilistic Dependency Model for Contextual Word Representation
Haoyi Wu
Kewei Tu
35
3
0
26 Nov 2023
LongLLMLingua: Accelerating and Enhancing LLMs in Long Context Scenarios via Prompt Compression
Huiqiang Jiang
Qianhui Wu
Xufang Luo
Dongsheng Li
Chin-Yew Lin
Yuqing Yang
Lili Qiu
RALM
96
179
0
10 Oct 2023
FlexGen: High-Throughput Generative Inference of Large Language Models with a Single GPU
Ying Sheng
Lianmin Zheng
Binhang Yuan
Zhuohan Li
Max Ryabinin
...
Joseph E. Gonzalez
Percy Liang
Christopher Ré
Ion Stoica
Ce Zhang
138
208
0
13 Mar 2023
1