ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2403.04643
  4. Cited By
QAQ: Quality Adaptive Quantization for LLM KV Cache

QAQ: Quality Adaptive Quantization for LLM KV Cache

7 March 2024
Shichen Dong
Wenfang Cheng
Jiayu Qin
Wei Wang
    MQ
ArXivPDFHTML

Papers citing "QAQ: Quality Adaptive Quantization for LLM KV Cache"

8 / 8 papers shown
Title
TurboQuant: Online Vector Quantization with Near-optimal Distortion Rate
TurboQuant: Online Vector Quantization with Near-optimal Distortion Rate
A. Zandieh
Majid Daliri
Majid Hadian
Vahab Mirrokni
MQ
74
0
0
28 Apr 2025
GPU-Accelerated Motion Planning of an Underactuated Forestry Crane in Cluttered Environments
GPU-Accelerated Motion Planning of an Underactuated Forestry Crane in Cluttered Environments
M. Vu
Gerald Ebmer
Alexander Watcher
Marc-Philip Ecker
Giang Nguyen
Tobias Glueck
57
2
0
18 Mar 2025
iServe: An Intent-based Serving System for LLMs
iServe: An Intent-based Serving System for LLMs
Dimitrios Liakopoulos
Tianrui Hu
Prasoon Sinha
N. Yadwadkar
VLM
56
0
0
08 Jan 2025
An Evolved Universal Transformer Memory
An Evolved Universal Transformer Memory
Edoardo Cetin
Qi Sun
Tianyu Zhao
Yujin Tang
38
0
0
17 Oct 2024
QSpec: Speculative Decoding with Complementary Quantization Schemes
QSpec: Speculative Decoding with Complementary Quantization Schemes
Juntao Zhao
Wenhao Lu
Sheng Wang
Lingpeng Kong
Chuan Wu
MQ
53
5
0
15 Oct 2024
Model Agnostic Hybrid Sharding For Heterogeneous Distributed Inference
Model Agnostic Hybrid Sharding For Heterogeneous Distributed Inference
Claudio Angione
Yue Zhao
Harry Yang
Ahmad Farhan
Fielding Johnston
James Buban
Patrick Colangelo
28
1
0
29 Jul 2024
KV Cache Compression, But What Must We Give in Return? A Comprehensive
  Benchmark of Long Context Capable Approaches
KV Cache Compression, But What Must We Give in Return? A Comprehensive Benchmark of Long Context Capable Approaches
Jiayi Yuan
Hongyi Liu
Shaochen
Zhong
Yu-Neng Chuang
...
Hongye Jin
V. Chaudhary
Zhaozhuo Xu
Zirui Liu
Xia Hu
34
17
0
01 Jul 2024
Efficient LLM Inference with Kcache
Efficient LLM Inference with Kcache
Qiaozhi He
Zhihua Wu
RALM
15
1
0
28 Apr 2024
1