ClusterKV: Manipulating LLM KV Cache in Semantic Space for Recallable Compression

4 December 2024

Papers citing "ClusterKV: Manipulating LLM KV Cache in Semantic Space for Recallable Compression"

5 / 5 papers shown

Title
Sparse Attention Remapping with Clustering for Efficient LLM Decoding on PIM Zehao Fan Garrett Gagnon Zhenyu Liu Liu Liu 29 0 0 09 May 2025
Adaptive Computation Pruning for the Forgetting Transformer Zhixuan Lin J. Obando-Ceron Xu Owen He Aaron C. Courville 32 0 0 09 Apr 2025
SentenceKV: Efficient LLM Inference via Sentence-Level Semantic KV Caching Yuxuan Zhu Ali Falahati David H. Yang Mohammad Mohammadi Amiri 58 0 0 01 Apr 2025
HeadInfer: Memory-Efficient LLM Inference by Head-wise Offloading Cheng Luo Zefan Cai Hanshi Sun Jinqi Xiao Bo Yuan Wen Xiao Junjie Hu Jiawei Zhao Beidi Chen Anima Anandkumar 66 1 0 18 Feb 2025
Tactic: Adaptive Sparse Attention with Clustering and Distribution Fitting for Long-Context LLMs Kan Zhu Tian Tang Qinyu Xu Yile Gu Zhichen Zeng Rohan Kadekodi Liangyu Zhao Ang Li Arvind Krishnamurthy Baris Kasikci 59 2 0 17 Feb 2025