Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2310.07177
Cited By
Online Speculative Decoding
11 October 2023
Xiaoxuan Liu
Lanxiang Hu
Peter Bailis
Alvin Cheung
Zhijie Deng
Ion Stoica
Hao Zhang
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Online Speculative Decoding"
40 / 40 papers shown
Title
Scaling Laws for Speculative Decoding
Siyuan Yan
Mo Zhu
Guo-qing Jiang
Jianfei Wang
Jiaxing Chen
...
Xiang Liao
Xiao Cui
Chen Zhang
Zhuoran Song
Ran Zhu
LRM
36
0
0
08 May 2025
Towards Harnessing the Collaborative Power of Large and Small Models for Domain Tasks
Yang Janet Liu
Bingjie Yan
Tianyuan Zou
Jianqing Zhang
Zixuan Gu
...
J. Li
Xiaozhou Ye
Ye Ouyang
Qiang Yang
Y. Zhang
ALM
89
1
0
24 Apr 2025
PipeDec: Low-Latency Pipeline-based Inference with Dynamic Speculative Decoding towards Large-scale Models
Haofei Yin
Mengbai Xiao
Rouzhou Lu
Xiao Zhang
Dongxiao Yu
Guanghui Zhang
AI4CE
19
0
0
05 Apr 2025
Scaling Test-Time Inference with Policy-Optimized, Dynamic Retrieval-Augmented Generation via KV Caching and Decoding
Sakhinana Sagar Srinivas
Venkataramana Runkana
OffRL
45
1
0
02 Apr 2025
SpecServe: Efficient and SLO-Aware Large Language Model Serving with Adaptive Speculative Decoding
Kaiyu Huang
Hao Wu
Zhubo Shi
Han Zou
Minchen Yu
Qingjiang Shi
LRM
36
1
0
07 Mar 2025
Speculative Decoding and Beyond: An In-Depth Survey of Techniques
Y. Hu
Zining Liu
Zhenyuan Dong
Tianfan Peng
Bradley McDanel
S. Zhang
85
0
0
27 Feb 2025
LongSpec: Long-Context Speculative Decoding with Efficient Drafting and Verification
Penghui Yang
Cunxiao Du
Fengzhuo Zhang
Haonan Wang
Tianyu Pang
Chao Du
Bo An
RALM
45
0
0
24 Feb 2025
GRIFFIN: Effective Token Alignment for Faster Speculative Decoding
Shijing Hu
Jingyang Li
Xingyu Xie
Zhihui Lu
Kim-Chuan Toh
Pan Zhou
38
0
0
16 Feb 2025
AdaServe: SLO-Customized LLM Serving with Fine-Grained Speculative Decoding
Zikun Li
Zhuofu Chen
Remi Delacourt
Gabriele Oliaro
Zeyu Wang
...
Zhihao Zhang
Zhuoming Chen
Sean Lai
Xupeng Miao
Zhihao Jia
47
6
0
21 Jan 2025
Constrained Decoding with Speculative Lookaheads
Nishanth Nakshatri
Shamik Roy
Rajarshi Das
Suthee Chaidaroon
Leonid Boytsov
Rashmi Gangadharaiah
72
0
0
09 Dec 2024
The N-Grammys: Accelerating Autoregressive Inference with Learning-Free Batched Speculation
Lawrence Stewart
Matthew Trager
Sujan Kumar Gonugondla
Stefano Soatto
45
5
0
06 Nov 2024
A Theoretical Perspective for Speculative Decoding Algorithm
Ming Yin
Minshuo Chen
Kaixuan Huang
Mengdi Wang
32
0
0
30 Oct 2024
SWIFT: On-the-Fly Self-Speculative Decoding for LLM Inference Acceleration
Heming Xia
Yongqi Li
Jun Zhang
Cunxiao Du
Wenjie Li
LRM
44
4
0
09 Oct 2024
A Survey: Collaborative Hardware and Software Design in the Era of Large Language Models
Cong Guo
Feng Cheng
Zhixu Du
James Kiessling
Jonathan Ku
...
Qilin Zheng
Guanglei Zhou
Hai
Li-Wei Li
Yiran Chen
29
7
0
08 Oct 2024
Efficient Inference for Large Language Model-based Generative Recommendation
Xinyu Lin
Chaoqun Yang
Wenjie Wang
Yongqi Li
Cunxiao Du
Fuli Feng
See-Kiong Ng
Tat-Seng Chua
65
4
0
07 Oct 2024
Interactive Speculative Planning: Enhance Agent Efficiency through Co-design of System and User Interface
Wenyue Hua
Mengting Wan
Shashank Vadrevu
Ryan Nadel
Yongfeng Zhang
Chi Wang
LLMAG
24
1
0
30 Sep 2024
Learning Harmonized Representations for Speculative Sampling
Lefan Zhang
Xiaodan Wang
Yanhua Huang
Ruiwen Xu
16
0
0
28 Aug 2024
Knowledge boosting during low-latency inference
Vidya Srinivas
Malek Itani
Tuochao Chen
Sefik Emre Eskimez
Takuya Yoshioka
Shyamnath Gollakota
19
2
0
09 Jul 2024
Teola: Towards End-to-End Optimization of LLM-based Applications
Xin Tan
Yimin Jiang
Yitao Yang
Hong-Yu Xu
57
5
0
29 Jun 2024
EAGLE-2: Faster Inference of Language Models with Dynamic Draft Trees
Yuhui Li
Fangyun Wei
Chao Zhang
Hongyang R. Zhang
88
50
0
24 Jun 2024
SpecExec: Massively Parallel Speculative Decoding for Interactive LLM Inference on Consumer Devices
Ruslan Svirschevski
Avner May
Zhuoming Chen
Beidi Chen
Zhihao Jia
Max Ryabinin
23
12
0
04 Jun 2024
SpecDec++: Boosting Speculative Decoding via Adaptive Candidate Lengths
Kaixuan Huang
Xudong Guo
Mengdi Wang
32
17
0
30 May 2024
A Declarative System for Optimizing AI Workloads
Chunwei Liu
Matthew Russo
Michael Cafarella
Lei Cao
Peter Baille Chen
Zui Chen
Michael Franklin
Tim Kraska
Samuel Madden
Gerardo Vitagliano
34
20
0
23 May 2024
Beyond the Speculative Game: A Survey of Speculative Execution in Large Language Models
Chen Zhang
Zhuorui Liu
Dawei Song
LRM
28
3
0
23 Apr 2024
A Survey on Efficient Inference for Large Language Models
Zixuan Zhou
Xuefei Ning
Ke Hong
Tianyu Fu
Jiaming Xu
...
Shengen Yan
Guohao Dai
Xiao-Ping Zhang
Yuhan Dong
Yu-Xiang Wang
46
80
0
22 Apr 2024
Recurrent Drafter for Fast Speculative Decoding in Large Language Models
Aonan Zhang
Chong-Jun Wang
Yi Wang
Xuanyu Zhang
Yunfei Cheng
26
15
0
14 Mar 2024
LLM Inference Unveiled: Survey and Roofline Model Insights
Zhihang Yuan
Yuzhang Shang
Yang Zhou
Zhen Dong
Zhe Zhou
...
Yong Jae Lee
Yan Yan
Beidi Chen
Guangyu Sun
Kurt Keutzer
37
79
0
26 Feb 2024
Chimera: A Lossless Decoding Method for Accelerating Large Language Models Inference by Fusing all Tokens
Ziqian Zeng
Jiahong Yu
Qianshi Pang
Zihao W. Wang
Huiping Zhuang
Cen Chen
Xiaofeng Zou
26
4
0
24 Feb 2024
Online Cascade Learning for Efficient Inference over Streams
Lunyiu Nie
Zhimin Ding
Erdong Hu
Christopher M. Jermaine
Swarat Chaudhuri
29
4
0
07 Feb 2024
Break the Sequential Dependency of LLM Inference Using Lookahead Decoding
Yichao Fu
Peter Bailis
Ion Stoica
Hao Zhang
123
137
0
03 Feb 2024
Decoding Speculative Decoding
Minghao Yan
Saurabh Agarwal
Shivaram Venkataraman
LRM
25
5
0
02 Feb 2024
Rephrasing the Web: A Recipe for Compute and Data-Efficient Language Modeling
Pratyush Maini
Skyler Seto
Richard He Bai
David Grangier
Yizhe Zhang
Navdeep Jaitly
SyDa
33
54
0
29 Jan 2024
EAGLE: Speculative Sampling Requires Rethinking Feature Uncertainty
Yuhui Li
Fangyun Wei
Chao Zhang
Hongyang R. Zhang
26
116
0
26 Jan 2024
MambaByte: Token-free Selective State Space Model
Junxiong Wang
Tushaar Gangavarapu
Jing Nathan Yan
Alexander M. Rush
Mamba
25
34
0
24 Jan 2024
BiTA: Bi-Directional Tuning for Lossless Acceleration in Large Language Models
Feng-Huei Lin
Hanling Yi
Hongbin Li
Yifan Yang
Xiaotian Yu
Guangming Lu
Rong Xiao
32
3
0
23 Jan 2024
Medusa: Simple LLM Inference Acceleration Framework with Multiple Decoding Heads
Tianle Cai
Yuhong Li
Zhengyang Geng
Hongwu Peng
Jason D. Lee
De-huai Chen
Tri Dao
29
246
0
19 Jan 2024
The Synergy of Speculative Decoding and Batching in Serving Large Language Models
Qidong Su
Christina Giannoula
Gennady Pekhimenko
13
10
0
28 Oct 2023
DistillSpec: Improving Speculative Decoding via Knowledge Distillation
Yongchao Zhou
Kaifeng Lyu
A. S. Rawat
A. Menon
Afshin Rostamizadeh
Sanjiv Kumar
Jean-François Kagy
Rishabh Agarwal
42
78
0
12 Oct 2023
On-Policy Distillation of Language Models: Learning from Self-Generated Mistakes
Rishabh Agarwal
Nino Vieillard
Yongchao Zhou
Piotr Stańczyk
Sabela Ramos
Matthieu Geist
Olivier Bachem
35
84
0
23 Jun 2023
Scaling Laws for Neural Language Models
Jared Kaplan
Sam McCandlish
T. Henighan
Tom B. Brown
B. Chess
R. Child
Scott Gray
Alec Radford
Jeff Wu
Dario Amodei
226
4,424
0
23 Jan 2020
1