Towards Fast Multilingual LLM Inference: Speculative Decoding and Specialized Drafters

24 June 2024

Se-Young Yun

Papers citing "Towards Fast Multilingual LLM Inference: Speculative Decoding and Specialized Drafters"

3 / 3 papers shown

Title
Mixture of Attentions For Speculative Decoding Matthieu Zimmer Milan Gritta Gerasimos Lampouras Haitham Bou Ammar Jun Wang 63 4 0 04 Oct 2024
Speculative Streaming: Fast LLM Inference without Auxiliary Models Nikhil Bhendawade Irina Belousova Qichen Fu Henry Mason Mohammad Rastegari Mahyar Najibi LRM 24 27 0 16 Feb 2024
Break the Sequential Dependency of LLM Inference Using Lookahead Decoding Yichao Fu Peter Bailis Ion Stoica Hao Zhang 120 134 0 03 Feb 2024