Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2402.06262
Cited By
On the Efficacy of Eviction Policy for Key-Value Constrained Generative Language Model Inference
9 February 2024
Siyu Ren
Kenny Q. Zhu
Re-assign community
ArXiv
PDF
HTML
Papers citing
"On the Efficacy of Eviction Policy for Key-Value Constrained Generative Language Model Inference"
23 / 23 papers shown
Title
KeepKV: Eliminating Output Perturbation in KV Cache Compression for Efficient LLMs Inference
Yuxuan Tian
Zihan Wang
Yebo Peng
Aomufei Yuan
Z. Wang
Bairen Yi
Xin Liu
Yong Cui
Tong Yang
24
0
0
14 Apr 2025
Plug-and-Play 1.x-Bit KV Cache Quantization for Video Large Language Models
Keda Tao
Haoxuan You
Yang Sui
Can Qin
H. Wang
VLM
MQ
81
0
0
20 Mar 2025
SpeCache: Speculative Key-Value Caching for Efficient Generation of LLMs
Shibo Jie
Yehui Tang
Kai Han
Zhi-Hong Deng
Jing Han
84
0
0
20 Mar 2025
GPU-Accelerated Motion Planning of an Underactuated Forestry Crane in Cluttered Environments
M. Vu
Gerald Ebmer
Alexander Watcher
Marc-Philip Ecker
Giang Nguyen
Tobias Glueck
49
0
0
18 Mar 2025
CAKE: Cascading and Adaptive KV Cache Eviction with Layer Preferences
Ziran Qin
Yuchen Cao
Mingbao Lin
Wen Hu
Shixuan Fan
Ke Cheng
Weiyao Lin
Jianguo Li
56
3
0
16 Mar 2025
MEDA: Dynamic KV Cache Allocation for Efficient Multimodal Long-Context Inference
Zhongwei Wan
H. Shen
Xin Wang
C. Liu
Zheda Mai
M. Zhang
VLM
52
3
0
24 Feb 2025
Ada-KV: Optimizing KV Cache Eviction by Adaptive Budget Allocation for Efficient LLM Inference
Yuan Feng
Junlin Lv
Yukun Cao
Xike Xie
S. K. Zhou
VLM
50
27
0
28 Jan 2025
ZigZagkv: Dynamic KV Cache Compression for Long-context Modeling based on Layer Uncertainty
M. Zhong
Xikai Liu
C. Zhang
Yikun Lei
Yan Gao
Yao Hu
Kehai Chen
Min Zhang
70
0
0
12 Dec 2024
ZipVL: Efficient Large Vision-Language Models with Dynamic Token Sparsification and KV Cache Compression
Yefei He
Feng Chen
Jing Liu
Wenqi Shao
Hong Zhou
K. Zhang
Bohan Zhuang
VLM
31
11
0
11 Oct 2024
Towards LifeSpan Cognitive Systems
Yu Wang
Chi Han
Tongtong Wu
Xiaoxin He
Wangchunshu Zhou
...
Zexue He
Wei Wang
Gholamreza Haffari
Heng Ji
Julian McAuley
KELM
CLL
71
1
0
20 Sep 2024
Finch: Prompt-guided Key-Value Cache Compression
Giulio Corallo
Paolo Papotti
33
0
0
31 Jul 2024
Model Tells You Where to Merge: Adaptive KV Cache Merging for LLMs on Long-Context Tasks
Zheng Wang
Boxiao Jin
Zhongzhi Yu
Minjia Zhang
MoMe
29
23
0
11 Jul 2024
LOOK-M: Look-Once Optimization in KV Cache for Efficient Multimodal Long-Context Inference
Zhongwei Wan
Ziang Wu
Che Liu
Jinfa Huang
Zhihong Zhu
Peng Jin
Longyue Wang
Li Yuan
VLM
20
28
0
26 Jun 2024
Attention Score is not All You Need for Token Importance Indicator in KV Cache Reduction: Value Also Matters
Zhiyu Guo
Hidetaka Kamigaito
Taro Watanabe
19
20
0
18 Jun 2024
D2O: Dynamic Discriminative Operations for Efficient Long-Context Inference of Large Language Models
Zhongwei Wan
Xinjian Wu
Yu Zhang
Yi Xin
Chaofan Tao
...
Xin Wang
Siqi Luo
Jing Xiong
Mi Zhang
Mi Zhang
14
0
0
18 Jun 2024
CItruS: Chunked Instruction-aware State Eviction for Long Sequence Modeling
Yu Bai
Xiyuan Zou
Heyan Huang
Sanxing Chen
Marc-Antoine Rondeau
Yang Gao
Jackie Chi Kit Cheung
14
3
0
17 Jun 2024
The CAP Principle for LLM Serving: A Survey of Long-Context Large Language Model Serving
Pai Zeng
Zhenyu Ning
Jieru Zhao
Weihao Cui
Mengwei Xu
Liwei Guo
Xusheng Chen
Yizhou Shan
LLMAG
32
4
0
18 May 2024
CORM: Cache Optimization with Recent Message for Large Language Model Inference
Jincheng Dai
Zhuowei Huang
Haiyun Jiang
Chen Chen
Deng Cai
Wei Bi
Shuming Shi
14
3
0
24 Apr 2024
SliceGPT: Compress Large Language Models by Deleting Rows and Columns
Saleh Ashkboos
Maximilian L. Croci
Marcelo Gennari do Nascimento
Torsten Hoefler
James Hensman
VLM
122
143
0
26 Jan 2024
Transformers are Multi-State RNNs
Matanel Oren
Michael Hassid
Nir Yarden
Yossi Adi
Roy Schwartz
OffRL
19
34
0
11 Jan 2024
The Falcon Series of Open Language Models
Ebtesam Almazrouei
Hamza Alobeidli
Abdulaziz Alshamsi
Alessandro Cappelli
Ruxandra-Aimée Cojocaru
...
Quentin Malartic
Daniele Mazzotta
Badreddine Noune
B. Pannier
Guilherme Penedo
AI4TS
ALM
104
389
0
28 Nov 2023
LM-Infinite: Zero-Shot Extreme Length Generalization for Large Language Models
Chi Han
Qifan Wang
Hao Peng
Wenhan Xiong
Yu Chen
Heng Ji
Sinong Wang
29
47
0
30 Aug 2023
Big Bird: Transformers for Longer Sequences
Manzil Zaheer
Guru Guruganesh
Kumar Avinava Dubey
Joshua Ainslie
Chris Alberti
...
Philip Pham
Anirudh Ravula
Qifan Wang
Li Yang
Amr Ahmed
VLM
246
1,982
0
28 Jul 2020
1