Effectively Compress KV Heads for LLM

11 June 2024

Zelan Yang

Papers citing "Effectively Compress KV Heads for LLM"

3 / 3 papers shown

Title
Cognitive Memory in Large Language Models Lianlei Shan Shixian Luo Zezhou Zhu Yu Yuan Yong Wu LLMAG KELM 69 1 0 03 Apr 2025
ShadowKV: KV Cache in Shadows for High-Throughput Long-Context LLM Inference Hanshi Sun Li-Wen Chang Wenlei Bao Size Zheng Ningxin Zheng Xin Liu Harry Dong Yuejie Chi Beidi Chen VLM 88 16 0 28 Oct 2024
Train Short, Test Long: Attention with Linear Biases Enables Input Length Extrapolation Ofir Press Noah A. Smith M. Lewis 237 690 0 27 Aug 2021