Accelerating Production LLMs with Combined Token/Embedding Speculators

29 April 2024

Papers citing "Accelerating Production LLMs with Combined Token/Embedding Speculators"

3 / 3 papers shown

Title
Mixture of Attentions For Speculative Decoding Matthieu Zimmer Milan Gritta Gerasimos Lampouras Haitham Bou Ammar Jun Wang 63 4 0 04 Oct 2024
Hydra: Sequentially-Dependent Draft Heads for Medusa Decoding Zack Ankner Rishab Parthasarathy Aniruddha Nrusimha Christopher Rinard Jonathan Ragan-Kelley William Brandon 4 24 0 07 Feb 2024
Break the Sequential Dependency of LLM Inference Using Lookahead Decoding Yichao Fu Peter Bailis Ion Stoica Hao Zhang 120 134 0 03 Feb 2024