QAQ: Quality Adaptive Quantization for LLM KV Cache

7 March 2024

Papers citing "QAQ: Quality Adaptive Quantization for LLM KV Cache"

8 / 8 papers shown

Title
TurboQuant: Online Vector Quantization with Near-optimal Distortion Rate A. Zandieh Majid Daliri Majid Hadian Vahab Mirrokni MQ 74 0 0 28 Apr 2025
GPU-Accelerated Motion Planning of an Underactuated Forestry Crane in Cluttered Environments M. Vu Gerald Ebmer Alexander Watcher Marc-Philip Ecker Giang Nguyen Tobias Glueck 57 2 0 18 Mar 2025
iServe: An Intent-based Serving System for LLMs Dimitrios Liakopoulos Tianrui Hu Prasoon Sinha N. Yadwadkar VLM 56 0 0 08 Jan 2025
An Evolved Universal Transformer Memory Edoardo Cetin Qi Sun Tianyu Zhao Yujin Tang 38 0 0 17 Oct 2024
QSpec: Speculative Decoding with Complementary Quantization Schemes Juntao Zhao Wenhao Lu Sheng Wang Lingpeng Kong Chuan Wu MQ 53 5 0 15 Oct 2024
Model Agnostic Hybrid Sharding For Heterogeneous Distributed Inference Claudio Angione Yue Zhao Harry Yang Ahmad Farhan Fielding Johnston James Buban Patrick Colangelo 28 1 0 29 Jul 2024
KV Cache Compression, But What Must We Give in Return? A Comprehensive Benchmark of Long Context Capable Approaches Jiayi Yuan Hongyi Liu Shaochen Zhong Yu-Neng Chuang ... Hongye Jin V. Chaudhary Zhaozhuo Xu Zirui Liu Xia Hu 34 17 0 01 Jul 2024
Efficient LLM Inference with Kcache Qiaozhi He Zhihua Wu RALM 15 1 0 28 Apr 2024