Palu: Compressing KV-Cache with Low-Rank Projection

Palu: Compressing KV-Cache with Low-Rank Projection

30 July 2024

Chi-Chih Chang

Wei-Cheng Lin

Papers citing "Palu: Compressing KV-Cache with Low-Rank Projection"

6 / 6 papers shown

Title
Cognitive Memory in Large Language Models Lianlei Shan Shixian Luo Zezhou Zhu Yu Yuan Yong Wu LLMAG KELM 58 1 0 03 Apr 2025
ShadowKV: KV Cache in Shadows for High-Throughput Long-Context LLM Inference Hanshi Sun Li-Wen Chang Wenlei Bao Size Zheng Ningxin Zheng Xin Liu Harry Dong Yuejie Chi Beidi Chen VLM 78 16 0 28 Oct 2024
ThinK: Thinner Key Cache by Query-Driven Pruning Yuhui Xu Zhanming Jie Hanze Dong Lei Wang Xudong Lu Aojun Zhou Amrita Saha Caiming Xiong Doyen Sahoo 58 14 0 30 Jul 2024
QuIP#: Even Better LLM Quantization with Hadamard Incoherence and Lattice Codebooks Albert Tseng Jerry Chee Qingyao Sun Volodymyr Kuleshov Christopher De Sa MQ 117 91 0 06 Feb 2024
SliceGPT: Compress Large Language Models by Deleting Rows and Columns Saleh Ashkboos Maximilian L. Croci Marcelo Gennari do Nascimento Torsten Hoefler James Hensman VLM 122 143 0 26 Jan 2024
Train Short, Test Long: Attention with Linear Biases Enables Input Length Extrapolation Ofir Press Noah A. Smith M. Lewis 234 690 0 27 Aug 2021