Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2506.05345
Cited By
v1
v2 (latest)
Inference-Time Hyper-Scaling with KV Cache Compression
5 June 2025
Adrian Łańcucki
Konrad Staniszewski
Piotr Nawrot
Edoardo Ponti
Re-assign community
ArXiv (abs)
PDF
HTML
HuggingFace (27 upvotes)
Github (25621★)
Papers citing
"Inference-Time Hyper-Scaling with KV Cache Compression"
6 / 6 papers shown
Title
Attention and Compression is all you need for Controllably Efficient Language Models
Jatin Prakash
A. Puli
Rajesh Ranganath
MQ
VLM
434
0
0
07 Nov 2025
KV Cache Transform Coding for Compact Storage in LLM Inference
Konrad Staniszewski
Adrian Łańcucki
VLM
260
0
0
03 Nov 2025
Alleviating Forgetfulness of Linear Attention by Hybrid Sparse Attention and Contextualized Learnable Token Eviction
Mutian He
Philip N. Garner
CLL
232
0
0
23 Oct 2025
AsyncSpade: Efficient Test-Time Scaling with Asynchronous Sparse Decoding
Shuqing Luo
Yilin Guan
Pingzhi Li
Hanrui Wang
Tianlong Chen
108
0
0
08 Oct 2025
On the Role of Temperature Sampling in Test-Time Scaling
Yuheng Wu
Azalia Mirhoseini
Thierry Tambe
ALM
LRM
89
1
1
02 Oct 2025
Expected Attention: KV Cache Compression by Estimating Attention from Future Queries Distribution
Alessio Devoto
Maximilian Jeblick
Simon Jégou
MQ
VLM
84
2
0
01 Oct 2025
1