Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2404.19124
Cited By
Accelerating Production LLMs with Combined Token/Embedding Speculators
29 April 2024
Davis Wertheimer
Joshua Rosenkranz
Thomas Parnell
Sahil Suneja
Pavithra Ranganathan
R. Ganti
M. Srivatsa
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Accelerating Production LLMs with Combined Token/Embedding Speculators"
3 / 3 papers shown
Title
Mixture of Attentions For Speculative Decoding
Matthieu Zimmer
Milan Gritta
Gerasimos Lampouras
Haitham Bou Ammar
Jun Wang
63
4
0
04 Oct 2024
Hydra: Sequentially-Dependent Draft Heads for Medusa Decoding
Zack Ankner
Rishab Parthasarathy
Aniruddha Nrusimha
Christopher Rinard
Jonathan Ragan-Kelley
William Brandon
4
24
0
07 Feb 2024
Break the Sequential Dependency of LLM Inference Using Lookahead Decoding
Yichao Fu
Peter Bailis
Ion Stoica
Hao Zhang
120
134
0
03 Feb 2024
1