Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2410.07590
Cited By
TurboRAG: Accelerating Retrieval-Augmented Generation with Precomputed KV Caches for Chunked Text
10 October 2024
Songshuo Lu
Hua Wang
Yutian Rong
Zhi Chen
Yaohua Tang
VLM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"TurboRAG: Accelerating Retrieval-Augmented Generation with Precomputed KV Caches for Chunked Text"
6 / 6 papers shown
Title
Shared Disk KV Cache Management for Efficient Multi-Instance Inference in RAG-Powered LLMs
Hyungwoo Lee
Kihyun Kim
Jinwoo Kim
Jungmin So
Myung-Hoon Cha
H. Kim
James J. Kim
Youngjae Kim
30
0
0
16 Apr 2025
OSCAR: Online Soft Compression And Reranking
Maxime Louis
Thibault Formal
Hervé Déjean
S. Clinchant
26
0
0
17 Mar 2025
Efficient Many-Shot In-Context Learning with Dynamic Block-Sparse Attention
Emily Xiao
Chin-Jou Li
Yilin Zhang
Graham Neubig
Amanda Bertsch
BDL
68
0
0
11 Mar 2025
Leveraging Approximate Caching for Faster Retrieval-Augmented Generation
Shai Bergman
Zhang Ji
Anne-Marie Kermarrec
Diana Petrescu
Rafael Pires
Mathis Randl
M. Vos
34
0
0
07 Mar 2025
TeleRAG: Efficient Retrieval-Augmented Generation Inference with Lookahead Retrieval
Chien-Yu Lin
Keisuke Kamahori
Yiyu Liu
Xiaoxiang Shi
Madhav Kashyap
...
Stephanie Wang
Arvind Krishnamurthy
Rohan Kadekodi
Luis Ceze
Baris Kasikci
3DV
VLM
55
0
0
28 Feb 2025
Attention Entropy is a Key Factor: An Analysis of Parallel Context Encoding with Full-attention-based Pre-trained Language Models
Zhisong Zhang
Yan Wang
Xinting Huang
Tianqing Fang
H. Zhang
Chenlong Deng
Shuaiyi Li
Dong Yu
75
2
0
21 Dec 2024
1