ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2401.14021
  4. Cited By
Accelerating Retrieval-Augmented Language Model Serving with Speculation

Accelerating Retrieval-Augmented Language Model Serving with Speculation

25 January 2024
Zhihao Zhang
Alan Zhu
Lijie Yang
Yihua Xu
Lanting Li
P. Phothilimthana
Zhihao Jia
    RALMKELM
ArXiv (abs)PDFHTMLGithub

Papers citing "Accelerating Retrieval-Augmented Language Model Serving with Speculation"

15 / 15 papers shown
Zero-RAG: Towards Retrieval-Augmented Generation with Zero Redundant Knowledge
Zero-RAG: Towards Retrieval-Augmented Generation with Zero Redundant Knowledge
Qi Luo
X. Li
Junqi Dai
Shuang Cheng
Xipeng Qiu
RALM
400
1
0
01 Nov 2025
FastTTS: Accelerating Test-Time Scaling for Edge LLM Reasoning
FastTTS: Accelerating Test-Time Scaling for Edge LLM Reasoning
Hao Mark Chen
Zhiwen Mo
Guanxi Lu
Shuang Liang
Lingxiao Ma
Wayne Luk
Hongxiang Fan
LRM
258
0
0
29 Aug 2025
L-MTP: Leap Multi-Token Prediction Beyond Adjacent Context for Large Language Models
L-MTP: Leap Multi-Token Prediction Beyond Adjacent Context for Large Language Models
Xiaohao Liu
Xiaobo Xia
Weixiang Zhao
Manyi Zhang
Xianzhi Yu
Xiu Su
Shuo Yang
See-Kiong Ng
Tat-Seng Chua
KELMLRM
480
7
0
23 May 2025
Patchwork: A Unified Framework for RAG Serving
Patchwork: A Unified Framework for RAG Serving
Bodun Hu
Luis Pabon
Saurabh Agarwal
Aditya Akella
283
0
0
01 May 2025
Taming the Titans: A Survey of Efficient LLM Inference Serving
Taming the Titans: A Survey of Efficient LLM Inference Serving
Ranran Zhen
Junlin Li
Yixin Ji
Zhiyong Yang
Tong Liu
Qingrong Xia
Xinyu Duan
Zehao Wang
Baoxing Huai
Hao Fei
LLMAG
522
13
0
28 Apr 2025
Tutorial Proposal: Speculative Decoding for Efficient LLM Inference
Tutorial Proposal: Speculative Decoding for Efficient LLM Inference
Heming Xia
Cunxiao Du
Yongqian Li
Qian Liu
Wenjie Li
395
4
0
01 Mar 2025
TeleRAG: Efficient Retrieval-Augmented Generation Inference with Lookahead Retrieval
TeleRAG: Efficient Retrieval-Augmented Generation Inference with Lookahead Retrieval
Chien-Yu Lin
Keisuke Kamahori
Yiyu Liu
Xiaoxiang Shi
Madhav Kashyap
...
Stephanie Wang
Arvind Krishnamurthy
Rohan Kadekodi
Luis Ceze
Baris Kasikci
3DVVLM
1.1K
10
0
28 Feb 2025
DReSD: Dense Retrieval for Speculative Decoding
DReSD: Dense Retrieval for Speculative DecodingAnnual Meeting of the Association for Computational Linguistics (ACL), 2025
Milan Gritta
Huiyin Xue
Gerasimos Lampouras
RALM
610
2
0
21 Feb 2025
Vendi-RAG: Adaptively Trading-Off Diversity And Quality Significantly Improves Retrieval Augmented Generation With LLMs
Vendi-RAG: Adaptively Trading-Off Diversity And Quality Significantly Improves Retrieval Augmented Generation With LLMs
Mohammad Reza Rezaei
Adji Bousso Dieng
VLM
616
16
0
16 Feb 2025
Deploying Foundation Model Powered Agent Services: A Survey
Deploying Foundation Model Powered Agent Services: A Survey
Wenchao Xu
Jinyu Chen
Peirong Zheng
Xiaoquan Yi
Tianyi Tian
...
Quan Wan
Yining Qi
Yunfeng Fan
Qinliang Su
Xuemin Shen
AI4CE
545
7
0
18 Dec 2024
AT-RAG: An Adaptive RAG Model Enhancing Query Efficiency with Topic
  Filtering and Iterative Reasoning
AT-RAG: An Adaptive RAG Model Enhancing Query Efficiency with Topic Filtering and Iterative Reasoning
Mohammad Reza Rezaei
Maziar Hafezi
Amit Satpathy
Lovell Hodge
Ebrahim Pourjafari
248
10
0
16 Oct 2024
RAGCache: Efficient Knowledge Caching for Retrieval-Augmented Generation
RAGCache: Efficient Knowledge Caching for Retrieval-Augmented Generation
Chao Jin
Zili Zhang
Xuanlin Jiang
Fangyue Liu
Xin Liu
Xuanzhe Liu
Xin Jin
496
99
0
18 Apr 2024
Unlocking Efficiency in Large Language Model Inference: A Comprehensive
  Survey of Speculative Decoding
Unlocking Efficiency in Large Language Model Inference: A Comprehensive Survey of Speculative DecodingAnnual Meeting of the Association for Computational Linguistics (ACL), 2024
Heming Xia
Zhe Yang
Qingxiu Dong
Peiyi Wang
Chak Tou Leong
Tao Ge
Tianyu Liu
Wenjie Li
Zhifang Sui
LRM
564
240
0
15 Jan 2024
Lookahead: An Inference Acceleration Framework for Large Language Model
  with Lossless Generation Accuracy
Lookahead: An Inference Acceleration Framework for Large Language Model with Lossless Generation Accuracy
Yao-Min Zhao
Zhitian Xie
Chen Liang
Chenyi Zhuang
Jinjie Gu
391
37
0
20 Dec 2023
Billion-scale similarity search with GPUs
Billion-scale similarity search with GPUsIEEE Transactions on Big Data (TBD), 2017
Jeff Johnson
Matthijs Douze
Edouard Grave
1.3K
4,864
0
28 Feb 2017
1
Page 1 of 1