Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2403.09919
Cited By
Recurrent Drafter for Fast Speculative Decoding in Large Language Models
14 March 2024
Aonan Zhang
Chong-Jun Wang
Yi Wang
Xuanyu Zhang
Yunfei Cheng
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Recurrent Drafter for Fast Speculative Decoding in Large Language Models"
3 / 3 papers shown
Title
Speculative Streaming: Fast LLM Inference without Auxiliary Models
Nikhil Bhendawade
Irina Belousova
Qichen Fu
Henry Mason
Mohammad Rastegari
Mahyar Najibi
LRM
24
27
0
16 Feb 2024
Hydra: Sequentially-Dependent Draft Heads for Medusa Decoding
Zack Ankner
Rishab Parthasarathy
Aniruddha Nrusimha
Christopher Rinard
Jonathan Ragan-Kelley
William Brandon
4
24
0
07 Feb 2024
Break the Sequential Dependency of LLM Inference Using Lookahead Decoding
Yichao Fu
Peter Bailis
Ion Stoica
Hao Zhang
120
134
0
03 Feb 2024
1