Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2406.03482
Cited By
v1
v2 (latest)
QJL: 1-Bit Quantized JL Transform for KV Cache Quantization with Zero Overhead
5 June 2024
A. Zandieh
Majid Daliri
Insu Han
MQ
Re-assign community
ArXiv (abs)
PDF
HTML
Github (25★)
Papers citing
"QJL: 1-Bit Quantized JL Transform for KV Cache Quantization with Zero Overhead"
10 / 10 papers shown
Mitigating Diffusion Model Hallucinations with Dynamic Guidance
Kostas Triaridis
Alexandros Graikos
Aggelina Chatziagapi
Grigorios G. Chrysos
Dimitris Samaras
DiffM
158
0
0
06 Oct 2025
KVmix: Gradient-Based Layer Importance-Aware Mixed-Precision Quantization for KV Cache
Fei Li
Song Liu
Weiguo Wu
Shiqiang Nie
Jinyu Wang
MQ
177
1
0
18 May 2025
TurboQuant: Online Vector Quantization with Near-optimal Distortion Rate
A. Zandieh
Majid Daliri
Majid Hadian
Vahab Mirrokni
MQ
769
16
0
28 Apr 2025
SQuat: Subspace-orthogonal KV Cache Quantization
Hao Wang
Ligong Han
Kai Xu
Akash Srivastava
MQ
439
3
0
31 Mar 2025
Compression Barriers for Autoregressive Transformers
Themistoklis Haris
Krzysztof Onak
200
2
0
21 Feb 2025
Locret: Enhancing Eviction in Long-Context LLM Inference with Trained Retaining Heads on Consumer-Grade Devices
Yuxiang Huang
Binhang Yuan
Xu Han
Chaojun Xiao
Zhiyuan Liu
RALM
653
12
0
02 Oct 2024
A Tighter Complexity Analysis of SparseGPT
Xiaoyu Li
Yingyu Liang
Zhenmei Shi
Zhao Song
405
23
0
22 Aug 2024
KV Cache Compression, But What Must We Give in Return? A Comprehensive Benchmark of Long Context Capable Approaches
Jiayi Yuan
Hongyi Liu
Shaochen
Zhong
Yu-Neng Chuang
...
Hongye Jin
Vipin Chaudhary
Zhaozhuo Xu
Zirui Liu
Xia Hu
360
43
0
01 Jul 2024
Streaming Kernel PCA Algorithm With Small Space
Yichuan Deng
Zhao Song
Zifan Wang
Hangke Zhang
381
4
0
08 Mar 2023
Fast Transformer Decoding: One Write-Head is All You Need
Noam M. Shazeer
834
731
0
06 Nov 2019
1
Page 1 of 1