ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2410.21465
  4. Cited By
ShadowKV: KV Cache in Shadows for High-Throughput Long-Context LLM Inference

ShadowKV: KV Cache in Shadows for High-Throughput Long-Context LLM Inference

28 October 2024
Hanshi Sun
Li-Wen Chang
Wenlei Bao
Size Zheng
Ningxin Zheng
Xin Liu
Harry Dong
Yuejie Chi
Beidi Chen
    VLM
ArXivPDFHTML

Papers citing "ShadowKV: KV Cache in Shadows for High-Throughput Long-Context LLM Inference"

9 / 9 papers shown
Title
SentenceKV: Efficient LLM Inference via Sentence-Level Semantic KV Caching
SentenceKV: Efficient LLM Inference via Sentence-Level Semantic KV Caching
Yuxuan Zhu
Ali Falahati
David H. Yang
Mohammad Mohammadi Amiri
49
0
0
01 Apr 2025
SpeCache: Speculative Key-Value Caching for Efficient Generation of LLMs
SpeCache: Speculative Key-Value Caching for Efficient Generation of LLMs
Shibo Jie
Yehui Tang
Kai Han
Zhi-Hong Deng
Jing Han
82
0
0
20 Mar 2025
LLMs Know What to Drop: Self-Attention Guided KV Cache Eviction for Efficient Long-Context Inference
G. Wang
Shubhangi Upasani
Chen Henry Wu
Darshan Gandhi
Jonathan Li
Changran Hu
Bo Li
Urmish Thakker
65
0
0
11 Mar 2025
SVDq: 1.25-bit and 410x Key Cache Compression for LLM Attention
SVDq: 1.25-bit and 410x Key Cache Compression for LLM Attention
Hong Yankun
Li Xing
Zhen Hui-Ling
Yu Xianzhi
Liu Wulong
Yuan Mingxuan
MQ
69
0
0
24 Feb 2025
HeadInfer: Memory-Efficient LLM Inference by Head-wise Offloading
HeadInfer: Memory-Efficient LLM Inference by Head-wise Offloading
Cheng Luo
Zefan Cai
Hanshi Sun
Jinqi Xiao
Bo Yuan
Wen Xiao
Junjie Hu
Jiawei Zhao
Beidi Chen
Anima Anandkumar
51
1
0
18 Feb 2025
BalanceKV: KV Cache Compression through Discrepancy Theory
BalanceKV: KV Cache Compression through Discrepancy Theory
Insu Han
Michael Kapralov
Ekaterina Kochetkova
Kshiteej Sheth
A. Zandieh
74
2
0
11 Feb 2025
Tensor Product Attention Is All You Need
Tensor Product Attention Is All You Need
Yifan Zhang
Yifeng Liu
Huizhuo Yuan
Zhen Qin
Yang Yuan
Q. Gu
Andrew Chi-Chih Yao
60
8
0
11 Jan 2025
Locret: Enhancing Eviction in Long-Context LLM Inference with Trained Retaining Heads on Consumer-Grade Devices
Locret: Enhancing Eviction in Long-Context LLM Inference with Trained Retaining Heads on Consumer-Grade Devices
Yuxiang Huang
Binhang Yuan
Xu Han
Chaojun Xiao
Zhiyuan Liu
RALM
67
1
0
02 Oct 2024
MInference 1.0: Accelerating Pre-filling for Long-Context LLMs via
  Dynamic Sparse Attention
MInference 1.0: Accelerating Pre-filling for Long-Context LLMs via Dynamic Sparse Attention
Huiqiang Jiang
Yucheng Li
Chengruidong Zhang
Qianhui Wu
Xufang Luo
...
Amir H. Abdi
Dongsheng Li
Chin-Yew Lin
Yuqing Yang
L. Qiu
58
1
0
02 Jul 2024
1