Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2404.12457
Cited By
RAGCache: Efficient Knowledge Caching for Retrieval-Augmented Generation
18 April 2024
Chao Jin
Zili Zhang
Xuanlin Jiang
Fangyue Liu
Xin Liu
Xuanzhe Liu
Xin Jin
Re-assign community
ArXiv
PDF
HTML
Papers citing
"RAGCache: Efficient Knowledge Caching for Retrieval-Augmented Generation"
25 / 25 papers shown
Title
From Human Memory to AI Memory: A Survey on Memory Mechanisms in the Era of LLMs
Yaxiong Wu
Sheng Liang
Chen Zhang
Y. Wang
Y. Zhang
Huifeng Guo
Ruiming Tang
Y. Liu
KELM
36
0
0
22 Apr 2025
Guillotine: Hypervisors for Isolating Malicious AIs
James Mickens
Sarah Radway
Ravi Netravali
20
0
0
22 Apr 2025
Shared Disk KV Cache Management for Efficient Multi-Instance Inference in RAG-Powered LLMs
Hyungwoo Lee
Kihyun Kim
Jinwoo Kim
Jungmin So
Myung-Hoon Cha
H. Kim
James J. Kim
Youngjae Kim
30
0
0
16 Apr 2025
An Adaptive Vector Index Partitioning Scheme for Low-Latency RAG Pipeline
J. Kim
Divya Mahajan
VLM
42
0
0
11 Apr 2025
HyperRAG: Enhancing Quality-Efficiency Tradeoffs in Retrieval-Augmented Generation with Reranker KV-Cache Reuse
Yuwei An
Yihua Cheng
Seo Jin Park
Junchen Jiang
36
1
0
03 Apr 2025
A Survey on Knowledge-Oriented Retrieval-Augmented Generation
Mingyue Cheng
Yucong Luo
Jie Ouyang
Q. Liu
Huijie Liu
...
Bohou Zhang
Jiawei Cao
Jie Ma
Daoyu Wang
Enhong Chen
3DV
61
3
0
11 Mar 2025
Leveraging Approximate Caching for Faster Retrieval-Augmented Generation
Shai Bergman
Zhang Ji
Anne-Marie Kermarrec
Diana Petrescu
Rafael Pires
Mathis Randl
M. Vos
34
0
0
07 Mar 2025
TeleRAG: Efficient Retrieval-Augmented Generation Inference with Lookahead Retrieval
Chien-Yu Lin
Keisuke Kamahori
Yiyu Liu
Xiaoxiang Shi
Madhav Kashyap
...
Stephanie Wang
Arvind Krishnamurthy
Rohan Kadekodi
Luis Ceze
Baris Kasikci
3DV
VLM
55
0
0
28 Feb 2025
KVLink: Accelerating Large Language Models via Efficient KV Cache Reuse
Jingbo Yang
Bairu Hou
Wei Wei
Yujia Bao
Shiyu Chang
VLM
36
2
0
21 Feb 2025
Efficient Long-Decoding Inference with Reasoning-Aware Attention Sparsity
Junhao Hu
Wenrui Huang
Weidong Wang
Zhenwen Li
Tiancheng Hu
Zhixia Liu
Xusheng Chen
Tao Xie
Yizhou Shan
LRM
43
0
0
16 Feb 2025
A Survey of Graph Retrieval-Augmented Generation for Customized Large Language Models
Qinggang Zhang
Shengyuan Chen
Yuanchen Bei
Zheng Yuan
Huachi Zhou
Zijin Hong
Junnan Dong
Hao-Heng Chen
Yi-Ju Chang
Xiao Huang
3DV
66
7
0
21 Jan 2025
RWKV-Lite: Deeply Compressed RWKV for Resource-Constrained Devices
Wonkyo Choe
Yangfeng Ji
F. Lin
59
1
0
14 Dec 2024
TurboRAG: Accelerating Retrieval-Augmented Generation with Precomputed KV Caches for Chunked Text
Songshuo Lu
Hua Wang
Yutian Rong
Zhi Chen
Yaohua Tang
VLM
25
11
0
10 Oct 2024
FAME: Towards Factual Multi-Task Model Editing
Li Zeng
Yingyu Shan
Zeming Liu
Jiashu Yao
Yuhang Guo
KELM
13
1
0
07 Oct 2024
LLMProxy: Reducing Cost to Access Large Language Models
Noah Martin
Abdullah Bin Faisal
Hiba Eltigani
Rukhshan Haroon
Swaminathan Lamelas
Fahad Dogar
31
1
0
04 Oct 2024
Geometric Collaborative Filtering with Convergence
Hisham Husain
Julien Monteil
FedML
23
5
0
04 Oct 2024
The Early Bird Catches the Leak: Unveiling Timing Side Channels in LLM Serving Systems
Linke Song
Zixuan Pang
Wenhao Wang
Zihao Wang
XiaoFeng Wang
Hongbo Chen
Wei Song
Yier Jin
Dan Meng
Rui Hou
36
6
0
30 Sep 2024
Do Large Language Models Need a Content Delivery Network?
Yihua Cheng
Kuntai Du
Jiayi Yao
Junchen Jiang
KELM
33
7
0
16 Sep 2024
MoRSE: Bridging the Gap in Cybersecurity Expertise with Retrieval Augmented Generation
Marco Simoni
Andrea Saracino
Vinod Puthuvath
Maurco Conti
47
1
0
22 Jul 2024
LLM Inference Serving: Survey of Recent Advances and Opportunities
Baolin Li
Yankai Jiang
V. Gadepally
Devesh Tiwari
70
15
0
17 Jul 2024
Teola: Towards End-to-End Optimization of LLM-based Applications
Xin Tan
Yimin Jiang
Yitao Yang
Hong-Yu Xu
48
4
0
29 Jun 2024
InstructRAG: Instructing Retrieval-Augmented Generation via Self-Synthesized Rationales
Zhepei Wei
Wei-Lin Chen
Yu Meng
RALM
53
12
0
19 Jun 2024
CacheBlend: Fast Large Language Model Serving for RAG with Cached Knowledge Fusion
Jiayi Yao
Hanchen Li
Yuhan Liu
Siddhant Ray
Yihua Cheng
Qizheng Zhang
Kuntai Du
Shan Lu
Junchen Jiang
42
12
0
26 May 2024
PipeRAG: Fast Retrieval-Augmented Generation via Algorithm-System Co-design
Wenqi Jiang
Shuai Zhang
Boran Han
Jie Wang
Bernie Wang
Tim Kraska
3DV
72
23
0
08 Mar 2024
LEMMA: Towards LVLM-Enhanced Multimodal Misinformation Detection with External Knowledge Augmentation
Keyang Xuan
Li Yi
Fan Yang
Ruochen Wu
Yi Ren Fung
Heng Ji
24
11
0
19 Feb 2024
1