Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2403.04643
Cited By
QAQ: Quality Adaptive Quantization for LLM KV Cache
7 March 2024
Shichen Dong
Wenfang Cheng
Jiayu Qin
Wei Wang
MQ
Re-assign community
ArXiv
PDF
HTML
Papers citing
"QAQ: Quality Adaptive Quantization for LLM KV Cache"
8 / 8 papers shown
Title
TurboQuant: Online Vector Quantization with Near-optimal Distortion Rate
A. Zandieh
Majid Daliri
Majid Hadian
Vahab Mirrokni
MQ
74
0
0
28 Apr 2025
GPU-Accelerated Motion Planning of an Underactuated Forestry Crane in Cluttered Environments
M. Vu
Gerald Ebmer
Alexander Watcher
Marc-Philip Ecker
Giang Nguyen
Tobias Glueck
57
2
0
18 Mar 2025
iServe: An Intent-based Serving System for LLMs
Dimitrios Liakopoulos
Tianrui Hu
Prasoon Sinha
N. Yadwadkar
VLM
54
0
0
08 Jan 2025
An Evolved Universal Transformer Memory
Edoardo Cetin
Qi Sun
Tianyu Zhao
Yujin Tang
38
0
0
17 Oct 2024
QSpec: Speculative Decoding with Complementary Quantization Schemes
Juntao Zhao
Wenhao Lu
Sheng Wang
Lingpeng Kong
Chuan Wu
MQ
53
5
0
15 Oct 2024
Model Agnostic Hybrid Sharding For Heterogeneous Distributed Inference
Claudio Angione
Yue Zhao
Harry Yang
Ahmad Farhan
Fielding Johnston
James Buban
Patrick Colangelo
20
1
0
29 Jul 2024
KV Cache Compression, But What Must We Give in Return? A Comprehensive Benchmark of Long Context Capable Approaches
Jiayi Yuan
Hongyi Liu
Shaochen
Zhong
Yu-Neng Chuang
...
Hongye Jin
V. Chaudhary
Zhaozhuo Xu
Zirui Liu
Xia Hu
28
17
0
01 Jul 2024
Efficient LLM Inference with Kcache
Qiaozhi He
Zhihua Wu
RALM
15
1
0
28 Apr 2024
1