Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2404.04793
Cited By
SqueezeAttention: 2D Management of KV-Cache in LLM Inference via Layer-wise Optimal Budget
7 April 2024
Zihao Wang
Shaoduo Gan
Re-assign community
ArXiv
PDF
HTML
Papers citing
"SqueezeAttention: 2D Management of KV-Cache in LLM Inference via Layer-wise Optimal Budget"
4 / 4 papers shown
Title
Cognitive Memory in Large Language Models
Lianlei Shan
Shixian Luo
Zezhou Zhu
Yu Yuan
Yong Wu
LLMAG
KELM
66
1
0
03 Apr 2025
Dialogue Without Limits: Constant-Sized KV Caches for Extended Responses in LLMs
Ravi Ghadia
Avinash Kumar
Gaurav Jain
Prashant J. Nair
Poulami Das
36
1
0
02 Mar 2025
MPCache: MPC-Friendly KV Cache Eviction for Efficient Private Large Language Model Inference
Wenxuan Zeng
Ye Dong
Jinjin Zhou
Junming Ma
Jin Tan
Runsheng Wang
Meng Li
47
0
0
12 Jan 2025
FlexGen: High-Throughput Generative Inference of Large Language Models with a Single GPU
Ying Sheng
Lianmin Zheng
Binhang Yuan
Zhuohan Li
Max Ryabinin
...
Joseph E. Gonzalez
Percy Liang
Christopher Ré
Ion Stoica
Ce Zhang
144
365
0
13 Mar 2023
1