Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2402.05109
Cited By
Hydra: Sequentially-Dependent Draft Heads for Medusa Decoding
7 February 2024
Zack Ankner
Rishab Parthasarathy
Aniruddha Nrusimha
Christopher Rinard
Jonathan Ragan-Kelley
William Brandon
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Hydra: Sequentially-Dependent Draft Heads for Medusa Decoding"
23 / 23 papers shown
Title
PARD: Accelerating LLM Inference with Low-Cost PARallel Draft Model Adaptation
Zihao An
Huajun Bai
Z. Liu
Dong Li
E. Barsoum
54
0
0
23 Apr 2025
Gumiho: A Hybrid Architecture to Prioritize Early Tokens in Speculative Decoding
J. Li
Yixing Xu
Haiduo Huang
Xuanwu Yin
D. Li
Edith C. -H. Ngai
E. Barsoum
50
0
0
13 Mar 2025
EAGLE-3: Scaling up Inference Acceleration of Large Language Models via Training-Time Test
Yuhui Li
Fangyun Wei
Chao Zhang
Hongyang R. Zhang
109
3
0
03 Mar 2025
Speculative Decoding and Beyond: An In-Depth Survey of Techniques
Y. Hu
Zining Liu
Zhenyuan Dong
Tianfan Peng
Bradley McDanel
S. Zhang
85
0
0
27 Feb 2025
From Hours to Minutes: Lossless Acceleration of Ultra Long Sequence Generation up to 100K Tokens
Tong Wu
Junzhe Shen
Zixia Jia
Y. Wang
Zilong Zheng
75
0
0
26 Feb 2025
Learning to Keep a Promise: Scaling Language Model Decoding Parallelism with Learned Asynchronous Decoding
Tian Jin
Ellie Y. Cheng
Zack Ankner
Nikunj Saunshi
Blake M. Elias
Amir Yazdanbakhsh
Jonathan Ragan-Kelley
Suvinay Subramanian
Michael Carbin
49
2
0
24 Feb 2025
GRIFFIN: Effective Token Alignment for Faster Speculative Decoding
Shijing Hu
Jingyang Li
Xingyu Xie
Zhihui Lu
Kim-Chuan Toh
Pan Zhou
38
0
0
16 Feb 2025
M2R2: Mixture of Multi-Rate Residuals for Efficient Transformer Inference
Nikhil Bhendawade
Mahyar Najibi
Devang Naik
Irina Belousova
MoE
82
0
0
04 Feb 2025
Speculative Decoding with CTC-based Draft Model for LLM Inference Acceleration
Zhuofan Wen
Shangtong Gui
Yang Feng
93
2
0
25 Nov 2024
Closer Look at Efficient Inference Methods: A Survey of Speculative Decoding
Hyun Ryu
Eric Kim
72
3
0
20 Nov 2024
SWIFT: On-the-Fly Self-Speculative Decoding for LLM Inference Acceleration
Heming Xia
Yongqi Li
Jun Zhang
Cunxiao Du
Wenjie Li
LRM
42
4
0
09 Oct 2024
Mixture of Attentions For Speculative Decoding
Matthieu Zimmer
Milan Gritta
Gerasimos Lampouras
Haitham Bou Ammar
Jun Wang
68
4
0
04 Oct 2024
LANTERN: Accelerating Visual Autoregressive Models with Relaxed Speculative Decoding
Doohyuk Jang
Sihwan Park
J. Yang
Yeonsung Jung
Jihun Yun
Souvik Kundu
Sung-Yub Kim
Eunho Yang
40
7
0
04 Oct 2024
Boosting Lossless Speculative Decoding via Feature Sampling and Partial Alignment Distillation
Lujun Gui
Bin Xiao
Lei Su
Weipeng Chen
22
2
0
28 Aug 2024
KOALA: Enhancing Speculative Decoding for LLM via Multi-Layer Draft Heads with Adversarial Learning
Kaiqi Zhang
Jing Zhao
Rui Chen
29
1
0
15 Aug 2024
EAGLE-2: Faster Inference of Language Models with Dynamic Draft Trees
Yuhui Li
Fangyun Wei
Chao Zhang
Hongyang R. Zhang
82
1
0
24 Jun 2024
S3D: A Simple and Cost-Effective Self-Speculative Decoding Scheme for Low-Memory GPUs
Wei Zhong
Manasa Bharadwaj
31
5
0
30 May 2024
EMS-SD: Efficient Multi-sample Speculative Decoding for Accelerating Large Language Models
Yunsheng Ni
Chuanjian Liu
Yehui Tang
Kai Han
Yunhe Wang
21
0
0
13 May 2024
Accelerating Production LLMs with Combined Token/Embedding Speculators
Davis Wertheimer
Joshua Rosenkranz
Thomas Parnell
Sahil Suneja
Pavithra Ranganathan
R. Ganti
M. Srivatsa
30
3
0
29 Apr 2024
Kangaroo: Lossless Self-Speculative Decoding via Double Early Exiting
Fangcheng Liu
Yehui Tang
Zhenhua Liu
Yunsheng Ni
Kai Han
Yunhe Wang
30
23
0
29 Apr 2024
TriForce: Lossless Acceleration of Long Sequence Generation with Hierarchical Speculative Decoding
Hanshi Sun
Zhuoming Chen
Xinyu Yang
Yuandong Tian
Beidi Chen
33
46
0
18 Apr 2024
Break the Sequential Dependency of LLM Inference Using Lookahead Decoding
Yichao Fu
Peter Bailis
Ion Stoica
Hao Zhang
123
134
0
03 Feb 2024
FlexGen: High-Throughput Generative Inference of Large Language Models with a Single GPU
Ying Sheng
Lianmin Zheng
Binhang Yuan
Zhuohan Li
Max Ryabinin
...
Joseph E. Gonzalez
Percy Liang
Christopher Ré
Ion Stoica
Ce Zhang
141
365
0
13 Mar 2023
1