TurboRAG: Accelerating Retrieval-Augmented Generation with Precomputed KV Caches for Chunked Text

10 October 2024

Papers citing "TurboRAG: Accelerating Retrieval-Augmented Generation with Precomputed KV Caches for Chunked Text"

6 / 6 papers shown

Title
Shared Disk KV Cache Management for Efficient Multi-Instance Inference in RAG-Powered LLMs Hyungwoo Lee Kihyun Kim Jinwoo Kim Jungmin So Myung-Hoon Cha H. Kim James J. Kim Youngjae Kim 30 0 0 16 Apr 2025
OSCAR: Online Soft Compression And Reranking Maxime Louis Thibault Formal Hervé Déjean S. Clinchant 26 0 0 17 Mar 2025
Efficient Many-Shot In-Context Learning with Dynamic Block-Sparse Attention Emily Xiao Chin-Jou Li Yilin Zhang Graham Neubig Amanda Bertsch BDL 68 0 0 11 Mar 2025
Leveraging Approximate Caching for Faster Retrieval-Augmented Generation Shai Bergman Zhang Ji Anne-Marie Kermarrec Diana Petrescu Rafael Pires Mathis Randl M. Vos 34 0 0 07 Mar 2025
TeleRAG: Efficient Retrieval-Augmented Generation Inference with Lookahead Retrieval Chien-Yu Lin Keisuke Kamahori Yiyu Liu Xiaoxiang Shi Madhav Kashyap ... Stephanie Wang Arvind Krishnamurthy Rohan Kadekodi Luis Ceze Baris Kasikci 3DV VLM 55 0 0 28 Feb 2025
Attention Entropy is a Key Factor: An Analysis of Parallel Context Encoding with Full-attention-based Pre-trained Language Models Zhisong Zhang Yan Wang Xinting Huang Tianqing Fang H. Zhang Chenlong Deng Shuaiyi Li Dong Yu 75 2 0 21 Dec 2024