Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2502.04420
Cited By
v1
v2
v3
v4
v5 (latest)
KVTuner: Sensitivity-Aware Layer-Wise Mixed-Precision KV Cache Quantization for Efficient and Nearly Lossless LLM Inference
6 February 2025
Xianrui Li
Zeyu Xing
Yongqian Li
Linping Qu
Hui-Ling Zhen
Wulong Liu
Yiwu Yao
Sinno Jialin Pan
Mingxuan Yuan
MQ
Re-assign community
ArXiv (abs)
PDF
HTML
Github (24★)
Papers citing
"KVTuner: Sensitivity-Aware Layer-Wise Mixed-Precision KV Cache Quantization for Efficient and Nearly Lossless LLM Inference"
0 / 0 papers shown
No papers found
Page 1 of 0