Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2407.11550
Cited By
Ada-KV: Optimizing KV Cache Eviction by Adaptive Budget Allocation for Efficient LLM Inference
28 January 2025
Yuan Feng
Junlin Lv
Yukun Cao
Xike Xie
S. K. Zhou
VLM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Ada-KV: Optimizing KV Cache Eviction by Adaptive Budget Allocation for Efficient LLM Inference"
25 / 25 papers shown
Title
The Sparse Frontier: Sparse Attention Trade-offs in Transformer LLMs
Piotr Nawrot
Robert Li
Renjie Huang
Sebastian Ruder
Kelly Marchisio
E. Ponti
18
0
0
24 Apr 2025
From Human Memory to AI Memory: A Survey on Memory Mechanisms in the Era of LLMs
Yaxiong Wu
Sheng Liang
Chen Zhang
Y. Wang
Y. Zhang
Huifeng Guo
Ruiming Tang
Y. Liu
KELM
34
0
0
22 Apr 2025
KeepKV: Eliminating Output Perturbation in KV Cache Compression for Efficient LLMs Inference
Yuxuan Tian
Zihan Wang
Yebo Peng
Aomufei Yuan
Z. Wang
Bairen Yi
Xin Liu
Yong Cui
Tong Yang
24
0
0
14 Apr 2025
AlayaDB: The Data Foundation for Efficient and Effective Long-context LLM Inference
Yangshen Deng
Zhengxin You
Long Xiang
Qilong Li
Peiqi Yuan
...
Man Lung Yiu
Huan Li
Qiaomu Shen
Rui Mao
Bo Tang
28
0
0
14 Apr 2025
Head-Aware KV Cache Compression for Efficient Visual Autoregressive Modeling
Ziran Qin
Youru Lv
Mingbao Lin
Zeren Zhang
Danping Zou
Weiyao Lin
VLM
30
0
0
12 Apr 2025
Harnessing the Unseen: The Hidden Influence of Intrinsic Knowledge in Long-Context Language Models
Yu Fu
Haz Sameen Shahgir
Hui Liu
Xianfeng Tang
Qi He
Yue Dong
KELM
39
0
0
11 Apr 2025
LagKV: Lag-Relative Information of the KV Cache Tells Which Tokens Are Important
Manlai Liang
JiaMing Zhang
Xiong Li
Jinlong Li
MQ
28
0
0
07 Apr 2025
Scaling Test-Time Inference with Policy-Optimized, Dynamic Retrieval-Augmented Generation via KV Caching and Decoding
Sakhinana Sagar Srinivas
Venkataramana Runkana
OffRL
40
1
0
02 Apr 2025
GPU-Accelerated Motion Planning of an Underactuated Forestry Crane in Cluttered Environments
M. Vu
Gerald Ebmer
Alexander Watcher
Marc-Philip Ecker
Giang Nguyen
Tobias Glueck
49
0
0
18 Mar 2025
AdaReTaKe: Adaptive Redundancy Reduction to Perceive Longer for Video-language Understanding
Xiao Wang
Qingyi Si
Jianlong Wu
Shiyu Zhu
Li Cao
Liqiang Nie
VLM
78
2
0
16 Mar 2025
Beyond RAG: Task-Aware KV Cache Compression for Comprehensive Knowledge Reasoning
Giulio Corallo
Orion Weller
Fabio Petroni
Paolo Papotti
MQ
VLM
44
0
0
06 Mar 2025
CoKV: Optimizing KV Cache Allocation via Cooperative Game
Qiheng Sun
Hongwei Zhang
Haocheng Xia
Jiayao Zhang
Jinfei Liu
Kui Ren
VLM
32
0
0
21 Feb 2025
Neural Attention Search
Difan Deng
Marius Lindauer
80
0
0
21 Feb 2025
Activation-aware Probe-Query: Effective Key-Value Retrieval for Long-Context LLMs Inference
Q. Xiao
Jiachuan Wang
Haoyang Li
Cheng Deng
J. Tang
Shuangyin Li
Yongqi Zhang
Jun Wang
Lei Chen
LLMSV
38
1
0
20 Feb 2025
FairKV: Balancing Per-Head KV Cache for Fast Multi-GPU Inference
Bingzhe Zhao
Ke Cheng
Aomufei Yuan
Yuxuan Tian
Ruiguang Zhong
Chengchen Hu
Tong Yang
Lian Yu
39
0
0
19 Feb 2025
Tactic: Adaptive Sparse Attention with Clustering and Distribution Fitting for Long-Context LLMs
Kan Zhu
Tian Tang
Qinyu Xu
Yile Gu
Zhichen Zeng
Rohan Kadekodi
Liangyu Zhao
Ang Li
Arvind Krishnamurthy
Baris Kasikci
41
2
0
17 Feb 2025
Twilight: Adaptive Attention Sparsity with Hierarchical Top-
p
p
p
Pruning
C. Lin
Jiaming Tang
Shuo Yang
Hanshuo Wang
Tian Tang
Boyu Tian
Ion Stoica
Song Han
Mingyu Gao
68
2
0
04 Feb 2025
Can LLMs Maintain Fundamental Abilities under KV Cache Compression?
Xiang Liu
Zhenheng Tang
Hong Chen
Peijie Dong
Zeyu Li
Xiuze Zhou
Bo Li
Xuming Hu
Xiaowen Chu
63
3
0
04 Feb 2025
XKV: Personalized KV Cache Memory Reduction for Long-Context LLM Inference
Weizhuo Li
Zhigang Wang
Yu Gu
Ge Yu
MQ
59
0
0
08 Dec 2024
Not All Heads Matter: A Head-Level KV Cache Compression Method with Integrated Retrieval and Reasoning
Yu Fu
Zefan Cai
Abedelkadir Asi
Wayne Xiong
Yue Dong
Wen Xiao
23
14
0
25 Oct 2024
Impacts of Continued Legal Pre-Training and IFT on LLMs' Latent Representations of Human-Defined Legal Concepts
Shaun Ho
AILaw
23
0
0
15 Oct 2024
CritiPrefill: A Segment-wise Criticality-based Approach for Prefilling Acceleration in LLMs
Junlin Lv
Yuan Feng
Xike Xie
Xin Jia
Qirong Peng
Guiming Xie
13
3
0
19 Sep 2024
PyramidKV: Dynamic KV Cache Compression based on Pyramidal Information Funneling
Zefan Cai
Yichi Zhang
Bofei Gao
Yuliang Liu
Tianyu Liu
...
Wayne Xiong
Yue Dong
Baobao Chang
Junjie Hu
Wen Xiao
47
83
0
04 Jun 2024
SnapKV: LLM Knows What You are Looking for Before Generation
Yuhong Li
Yingbing Huang
Bowen Yang
Bharat Venkitesh
Acyr F. Locatelli
Hanchen Ye
Tianle Cai
Patrick Lewis
Deming Chen
VLM
73
18
0
22 Apr 2024
A Survey on Recent Advances in LLM-Based Multi-turn Dialogue Systems
Zihao Yi
Jiarui Ouyang
Yuwen Liu
Tianhao Liao
Zhe Xu
Ying Shen
LLMAG
LRM
47
57
0
28 Feb 2024
1