Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2409.10516
Cited By
RetrievalAttention: Accelerating Long-Context LLM Inference via Vector Retrieval
3 January 2025
Di Liu
Meng Chen
Baotong Lu
Huiqiang Jiang
Zhenhua Han
Qianxi Zhang
Qi Chen
Chengruidong Zhang
Bailu Ding
K. Zhang
C. L. P. Chen
Fan Yang
Y. Yang
Lili Qiu
Re-assign community
ArXiv
PDF
HTML
Papers citing
"RetrievalAttention: Accelerating Long-Context LLM Inference via Vector Retrieval"
18 / 18 papers shown
Title
RetroInfer: A Vector-Storage Approach for Scalable Long-Context LLM Inference
Y. Chen
J. Zhang
Baotong Lu
Qianxi Zhang
Chengruidong Zhang
...
Chen Chen
Mingxing Zhang
Yuqing Yang
Fan Yang
Mao Yang
24
0
0
05 May 2025
Rethinking Memory in AI: Taxonomy, Operations, Topics, and Future Directions
Yiming Du
Wenyu Huang
Danna Zheng
Zhaowei Wang
Sébastien Montella
Mirella Lapata
Kam-Fai Wong
Jeff Z. Pan
KELM
MU
65
1
0
01 May 2025
MMInference: Accelerating Pre-filling for Long-Context VLMs via Modality-Aware Permutation Sparse Attention
Yucheng Li
Huiqiang Jiang
Chengruidong Zhang
Qianhui Wu
Xufang Luo
...
Amir H. Abdi
Dongsheng Li
Jianfeng Gao
Y. Yang
Lili Qiu
24
1
0
22 Apr 2025
AlayaDB: The Data Foundation for Efficient and Effective Long-context LLM Inference
Yangshen Deng
Zhengxin You
Long Xiang
Qilong Li
Peiqi Yuan
...
Man Lung Yiu
Huan Li
Qiaomu Shen
Rui Mao
Bo Tang
26
0
0
14 Apr 2025
MoE-Lens: Towards the Hardware Limit of High-Throughput MoE LLM Serving Under Resource Constraints
Yichao Yuan
Lin Ma
Nishil Talati
MoE
54
0
0
12 Apr 2025
PipeDec: Low-Latency Pipeline-based Inference with Dynamic Speculative Decoding towards Large-scale Models
Haofei Yin
Mengbai Xiao
Rouzhou Lu
Xiao Zhang
Dongxiao Yu
Guanghui Zhang
AI4CE
14
0
0
05 Apr 2025
PromptDistill: Query-based Selective Token Retention in Intermediate Layers for Efficient Large Language Model Inference
Weisheng Jin
Maojia Song
Tej Deep Pala
Yew Ken Chia
Amir Zadeh
Chuan Li
Soujanya Poria
VLM
45
0
0
30 Mar 2025
AIstorian lets AI be a historian: A KG-powered multi-agent system for accurate biography generation
Fengyu Li
Yilin Li
Junhao Zhu
Lu Chen
Yanfei Zhang
Jia Zhou
Hui Zu
Jingwen Zhao
Yunjun Gao
LLMAG
43
0
0
14 Mar 2025
LLMs Know What to Drop: Self-Attention Guided KV Cache Eviction for Efficient Long-Context Inference
G. Wang
Shubhangi Upasani
Chen Henry Wu
Darshan Gandhi
Jonathan Li
Changran Hu
Bo Li
Urmish Thakker
65
0
0
11 Mar 2025
Progressive Sparse Attention: Algorithm and System Co-design for Efficient Attention in LLM Serving
Qihui Zhou
Peiqi Yin
Pengfei Zuo
James Cheng
CLL
28
1
0
01 Mar 2025
Machine learning and high dimensional vector search
Matthijs Douze
55
0
0
24 Feb 2025
Activation-aware Probe-Query: Effective Key-Value Retrieval for Long-Context LLMs Inference
Q. Xiao
Jiachuan Wang
Haoyang Li
Cheng Deng
J. Tang
Shuangyin Li
Yongqi Zhang
Jun Wang
Lei Chen
LLMSV
33
1
0
20 Feb 2025
Ask in Any Modality: A Comprehensive Survey on Multimodal Retrieval-Augmented Generation
Mohammad Mahdi Abootorabi
Amirhosein Zobeiri
Mahdi Dehghani
Mohammadali Mohammadkhani
Bardia Mohammadi
Omid Ghahroodi
M. Baghshah
Ehsaneddin Asgari
RALM
82
3
0
12 Feb 2025
QuantSpec: Self-Speculative Decoding with Hierarchical Quantized KV Cache
Rishabh Tiwari
Haocheng Xi
Aditya Tomar
Coleman Hooper
Sehoon Kim
Maxwell Horton
Mahyar Najibi
Michael W. Mahoney
K. K.
Amir Gholami
MQ
31
1
0
05 Feb 2025
Twilight: Adaptive Attention Sparsity with Hierarchical Top-
p
p
p
Pruning
C. Lin
Jiaming Tang
Shuo Yang
Hanshuo Wang
Tian Tang
Boyu Tian
Ion Stoica
Song Han
Mingyu Gao
65
2
0
04 Feb 2025
MPCache: MPC-Friendly KV Cache Eviction for Efficient Private Large Language Model Inference
Wenxuan Zeng
Ye Dong
Jinjin Zhou
Junming Ma
Jin Tan
Runsheng Wang
Meng Li
35
0
0
12 Jan 2025
Squeezed Attention: Accelerating Long Context Length LLM Inference
Coleman Hooper
Sehoon Kim
Hiva Mohammadzadeh
Monishwaran Maheswaran
June Paik
Michael W. Mahoney
K. K.
Amir Gholami
40
9
0
14 Nov 2024
MInference 1.0: Accelerating Pre-filling for Long-Context LLMs via Dynamic Sparse Attention
Huiqiang Jiang
Yucheng Li
Chengruidong Zhang
Qianhui Wu
Xufang Luo
...
Amir H. Abdi
Dongsheng Li
Chin-Yew Lin
Yuqing Yang
L. Qiu
58
1
0
02 Jul 2024
1