Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2401.14021
Cited By
Accelerating Retrieval-Augmented Language Model Serving with Speculation
25 January 2024
Zhihao Zhang
Alan Zhu
Lijie Yang
Yihua Xu
Lanting Li
P. Phothilimthana
Zhihao Jia
RALM
KELM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Accelerating Retrieval-Augmented Language Model Serving with Speculation"
15 / 15 papers shown
Title
Taming the Titans: A Survey of Efficient LLM Inference Serving
Ranran Zhen
J. Li
Yixin Ji
Z. Yang
Tong Liu
Qingrong Xia
Xinyu Duan
Z. Wang
Baoxing Huai
M. Zhang
LLMAG
77
0
0
28 Apr 2025
Tutorial Proposal: Speculative Decoding for Efficient LLM Inference
Heming Xia
Cunxiao Du
Y. Li
Qian Liu
Wenjie Li
34
0
0
01 Mar 2025
TeleRAG: Efficient Retrieval-Augmented Generation Inference with Lookahead Retrieval
Chien-Yu Lin
Keisuke Kamahori
Yiyu Liu
Xiaoxiang Shi
Madhav Kashyap
...
Stephanie Wang
Arvind Krishnamurthy
Rohan Kadekodi
Luis Ceze
Baris Kasikci
3DV
VLM
55
0
0
28 Feb 2025
DReSD: Dense Retrieval for Speculative Decoding
Milan Gritta
Huiyin Xue
Gerasimos Lampouras
RALM
93
0
0
24 Feb 2025
Vendi-RAG: Adaptively Trading-Off Diversity And Quality Significantly Improves Retrieval Augmented Generation With LLMs
Mohammad Reza Rezaei
Adji Bousso Dieng
VLM
36
1
0
16 Feb 2025
Deploying Foundation Model Powered Agent Services: A Survey
Wenchao Xu
Jinyu Chen
Peirong Zheng
Xiaoquan Yi
Tianyi Tian
...
Quan Wan
Haozhao Wang
Yunfeng Fan
Qinliang Su
Xuemin Shen
AI4CE
110
1
0
18 Dec 2024
AT-RAG: An Adaptive RAG Model Enhancing Query Efficiency with Topic Filtering and Iterative Reasoning
Mohammad Reza Rezaei
Maziar Hafezi
Amit Satpathy
Lovell Hodge
Ebrahim Pourjafari
18
2
0
16 Oct 2024
RAGCache: Efficient Knowledge Caching for Retrieval-Augmented Generation
Chao Jin
Zili Zhang
Xuanlin Jiang
Fangyue Liu
Xin Liu
Xuanzhe Liu
Xin Jin
26
36
0
18 Apr 2024
Unlocking Efficiency in Large Language Model Inference: A Comprehensive Survey of Speculative Decoding
Heming Xia
Zhe Yang
Qingxiu Dong
Peiyi Wang
Yongqi Li
Tao Ge
Tianyu Liu
Wenjie Li
Zhifang Sui
LRM
22
95
0
15 Jan 2024
Lookahead: An Inference Acceleration Framework for Large Language Model with Lossless Generation Accuracy
Yao-Min Zhao
Zhitian Xie
Chen Liang
Chenyi Zhuang
Jinjie Gu
37
10
0
20 Dec 2023
RecurrentGPT: Interactive Generation of (Arbitrarily) Long Text
Wangchunshu Zhou
Yuchen Eleanor Jiang
Peng Cui
Tiannan Wang
Zhenxin Xiao
Yifan Hou
Ryan Cotterell
Mrinmaya Sachan
RALM
LLMAG
76
58
0
22 May 2023
Generative Agents: Interactive Simulacra of Human Behavior
J. Park
Joseph C. O'Brien
Carrie J. Cai
Meredith Ringel Morris
Percy Liang
Michael S. Bernstein
LM&Ro
AI4CE
206
1,701
0
07 Apr 2023
Training Language Models with Memory Augmentation
Zexuan Zhong
Tao Lei
Danqi Chen
RALM
224
126
0
25 May 2022
Neuro-Symbolic Language Modeling with Automaton-augmented Retrieval
Uri Alon
Frank F. Xu
Junxian He
Sudipta Sengupta
Dan Roth
Graham Neubig
RALM
67
62
0
28 Jan 2022
Efficient Nearest Neighbor Language Models
Junxian He
Graham Neubig
Taylor Berg-Kirkpatrick
RALM
185
103
0
09 Sep 2021
1