ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2404.04793
  4. Cited By
SqueezeAttention: 2D Management of KV-Cache in LLM Inference via
  Layer-wise Optimal Budget

SqueezeAttention: 2D Management of KV-Cache in LLM Inference via Layer-wise Optimal Budget

7 April 2024
Zihao Wang
Shaoduo Gan
ArXivPDFHTML

Papers citing "SqueezeAttention: 2D Management of KV-Cache in LLM Inference via Layer-wise Optimal Budget"

4 / 4 papers shown
Title
Cognitive Memory in Large Language Models
Cognitive Memory in Large Language Models
Lianlei Shan
Shixian Luo
Zezhou Zhu
Yu Yuan
Yong Wu
LLMAG
KELM
69
1
0
03 Apr 2025
Dialogue Without Limits: Constant-Sized KV Caches for Extended Responses in LLMs
Ravi Ghadia
Avinash Kumar
Gaurav Jain
Prashant J. Nair
Poulami Das
36
1
0
02 Mar 2025
MPCache: MPC-Friendly KV Cache Eviction for Efficient Private Large Language Model Inference
MPCache: MPC-Friendly KV Cache Eviction for Efficient Private Large Language Model Inference
Wenxuan Zeng
Ye Dong
Jinjin Zhou
Junming Ma
Jin Tan
Runsheng Wang
Meng Li
47
0
0
12 Jan 2025
FlexGen: High-Throughput Generative Inference of Large Language Models
  with a Single GPU
FlexGen: High-Throughput Generative Inference of Large Language Models with a Single GPU
Ying Sheng
Lianmin Zheng
Binhang Yuan
Zhuohan Li
Max Ryabinin
...
Joseph E. Gonzalez
Percy Liang
Christopher Ré
Ion Stoica
Ce Zhang
144
365
0
13 Mar 2023
1