Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2406.03482
Cited By
QJL: 1-Bit Quantized JL Transform for KV Cache Quantization with Zero Overhead
5 June 2024
A. Zandieh
Majid Daliri
Insu Han
MQ
Re-assign community
ArXiv
PDF
HTML
Papers citing
"QJL: 1-Bit Quantized JL Transform for KV Cache Quantization with Zero Overhead"
8 / 8 papers shown
Title
TurboQuant: Online Vector Quantization with Near-optimal Distortion Rate
A. Zandieh
Majid Daliri
Majid Hadian
Vahab Mirrokni
MQ
74
0
0
28 Apr 2025
Locret: Enhancing Eviction in Long-Context LLM Inference with Trained Retaining Heads on Consumer-Grade Devices
Yuxiang Huang
Binhang Yuan
Xu Han
Chaojun Xiao
Zhiyuan Liu
RALM
69
1
0
02 Oct 2024
KV Cache Compression, But What Must We Give in Return? A Comprehensive Benchmark of Long Context Capable Approaches
Jiayi Yuan
Hongyi Liu
Shaochen
Zhong
Yu-Neng Chuang
...
Hongye Jin
V. Chaudhary
Zhaozhuo Xu
Zirui Liu
Xia Hu
34
17
0
01 Jul 2024
QAQ: Quality Adaptive Quantization for LLM KV Cache
Shichen Dong
Wenfang Cheng
Jiayu Qin
Wei Wang
MQ
38
10
0
07 Mar 2024
No Token Left Behind: Reliable KV Cache Compression via Importance-Aware Mixed Precision Quantization
J. Yang
Byeongwook Kim
Jeongin Bae
Beomseok Kwon
Gunho Park
Eunho Yang
S. Kwon
Dongsoo Lee
MQ
34
12
0
28 Feb 2024
SubGen: Token Generation in Sublinear Time and Memory
A. Zandieh
Insu Han
Vahab Mirrokni
Amin Karbasi
16
15
0
08 Feb 2024
FlexGen: High-Throughput Generative Inference of Large Language Models with a Single GPU
Ying Sheng
Lianmin Zheng
Binhang Yuan
Zhuohan Li
Max Ryabinin
...
Joseph E. Gonzalez
Percy Liang
Christopher Ré
Ion Stoica
Ce Zhang
138
208
0
13 Mar 2023
Scaling Laws for Neural Language Models
Jared Kaplan
Sam McCandlish
T. Henighan
Tom B. Brown
B. Chess
R. Child
Scott Gray
Alec Radford
Jeff Wu
Dario Amodei
220
3,054
0
23 Jan 2020
1