ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2410.07590
  4. Cited By
TurboRAG: Accelerating Retrieval-Augmented Generation with Precomputed
  KV Caches for Chunked Text

TurboRAG: Accelerating Retrieval-Augmented Generation with Precomputed KV Caches for Chunked Text

10 October 2024
Songshuo Lu
Hua Wang
Yutian Rong
Zhi Chen
Yaohua Tang
    VLM
ArXivPDFHTML

Papers citing "TurboRAG: Accelerating Retrieval-Augmented Generation with Precomputed KV Caches for Chunked Text"

6 / 6 papers shown
Title
Shared Disk KV Cache Management for Efficient Multi-Instance Inference in RAG-Powered LLMs
Shared Disk KV Cache Management for Efficient Multi-Instance Inference in RAG-Powered LLMs
Hyungwoo Lee
Kihyun Kim
Jinwoo Kim
Jungmin So
Myung-Hoon Cha
H. Kim
James J. Kim
Youngjae Kim
30
0
0
16 Apr 2025
OSCAR: Online Soft Compression And Reranking
OSCAR: Online Soft Compression And Reranking
Maxime Louis
Thibault Formal
Hervé Déjean
S. Clinchant
26
0
0
17 Mar 2025
Efficient Many-Shot In-Context Learning with Dynamic Block-Sparse Attention
Efficient Many-Shot In-Context Learning with Dynamic Block-Sparse Attention
Emily Xiao
Chin-Jou Li
Yilin Zhang
Graham Neubig
Amanda Bertsch
BDL
68
0
0
11 Mar 2025
Leveraging Approximate Caching for Faster Retrieval-Augmented Generation
Shai Bergman
Zhang Ji
Anne-Marie Kermarrec
Diana Petrescu
Rafael Pires
Mathis Randl
M. Vos
34
0
0
07 Mar 2025
TeleRAG: Efficient Retrieval-Augmented Generation Inference with Lookahead Retrieval
TeleRAG: Efficient Retrieval-Augmented Generation Inference with Lookahead Retrieval
Chien-Yu Lin
Keisuke Kamahori
Yiyu Liu
Xiaoxiang Shi
Madhav Kashyap
...
Stephanie Wang
Arvind Krishnamurthy
Rohan Kadekodi
Luis Ceze
Baris Kasikci
3DV
VLM
55
0
0
28 Feb 2025
Attention Entropy is a Key Factor: An Analysis of Parallel Context
  Encoding with Full-attention-based Pre-trained Language Models
Attention Entropy is a Key Factor: An Analysis of Parallel Context Encoding with Full-attention-based Pre-trained Language Models
Zhisong Zhang
Yan Wang
Xinting Huang
Tianqing Fang
H. Zhang
Chenlong Deng
Shuaiyi Li
Dong Yu
75
2
0
21 Dec 2024
1