Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2310.08461
Cited By
DistillSpec: Improving Speculative Decoding via Knowledge Distillation
12 October 2023
Yongchao Zhou
Kaifeng Lyu
A. S. Rawat
A. Menon
Afshin Rostamizadeh
Sanjiv Kumar
Jean-François Kagy
Rishabh Agarwal
Re-assign community
ArXiv
PDF
HTML
Papers citing
"DistillSpec: Improving Speculative Decoding via Knowledge Distillation"
13 / 13 papers shown
Title
Scaling Laws for Speculative Decoding
Siyuan Yan
Mo Zhu
Guo-qing Jiang
Jianfei Wang
Jiaxing Chen
...
Xiang Liao
Xiao Cui
Chen Zhang
Zhuoran Song
Ran Zhu
LRM
36
0
0
08 May 2025
Speculate, then Collaborate: Fusing Knowledge of Language Models during Decoding
Z. Wang
Muneeza Azmart
Ang Li
R. Horesh
Mikhail Yurochkin
107
0
0
11 Feb 2025
Uncertainty-Aware Hybrid Inference with On-Device Small and Remote Large Language Models
Seungeun Oh
Jinhyuk Kim
Jihong Park
Seung-Woo Ko
Tony Q. S. Quek
Seong-Lyun Kim
63
3
0
17 Dec 2024
SWIFT: On-the-Fly Self-Speculative Decoding for LLM Inference Acceleration
Heming Xia
Yongqi Li
Jun Zhang
Cunxiao Du
Wenjie Li
LRM
42
4
0
09 Oct 2024
Efficient Inference for Large Language Model-based Generative Recommendation
Xinyu Lin
Chaoqun Yang
Wenjie Wang
Yongqi Li
Cunxiao Du
Fuli Feng
See-Kiong Ng
Tat-Seng Chua
62
4
0
07 Oct 2024
Mixture of Attentions For Speculative Decoding
Matthieu Zimmer
Milan Gritta
Gerasimos Lampouras
Haitham Bou Ammar
Jun Wang
71
4
0
04 Oct 2024
PEARL: Parallel Speculative Decoding with Adaptive Draft Length
Tianyu Liu
Yun Li
Qitan Lv
Kai Liu
Jianchen Zhu
Winston Hu
X. Sun
39
10
0
13 Aug 2024
Hydra: Sequentially-Dependent Draft Heads for Medusa Decoding
Zack Ankner
Rishab Parthasarathy
Aniruddha Nrusimha
Christopher Rinard
Jonathan Ragan-Kelley
William Brandon
17
24
0
07 Feb 2024
EAGLE: Speculative Sampling Requires Rethinking Feature Uncertainty
Yuhui Li
Fangyun Wei
Chao Zhang
Hongyang R. Zhang
26
114
0
26 Jan 2024
Medusa: Simple LLM Inference Acceleration Framework with Multiple Decoding Heads
Tianle Cai
Yuhong Li
Zhengyang Geng
Hongwu Peng
Jason D. Lee
De-huai Chen
Tri Dao
29
243
0
19 Jan 2024
Lossless Acceleration for Seq2seq Generation with Aggressive Decoding
Tao Ge
Heming Xia
Xin Sun
Si-Qing Chen
Furu Wei
82
18
0
20 May 2022
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
Jason W. Wei
Xuezhi Wang
Dale Schuurmans
Maarten Bosma
Brian Ichter
F. Xia
Ed H. Chi
Quoc Le
Denny Zhou
LM&Ro
LRM
AI4CE
ReLM
315
8,261
0
28 Jan 2022
Teaching Machines to Read and Comprehend
Karl Moritz Hermann
Tomás Kociský
Edward Grefenstette
L. Espeholt
W. Kay
Mustafa Suleyman
Phil Blunsom
170
3,504
0
10 Jun 2015
1