S2D: Sorted Speculative Decoding For More Efficient Deployment of Nested Large Language Models

2 July 2024

Ali Ghodsi

Boxing Chen

Mehdi Rezagholizadeh

Papers citing "S2D: Sorted Speculative Decoding For More Efficient Deployment of Nested Large Language Models"

2 / 2 papers shown

Title
Break the Sequential Dependency of LLM Inference Using Lookahead Decoding Yichao Fu Peter Bailis Ion Stoica Hao Zhang 123 134 0 03 Feb 2024
Sorted LLaMA: Unlocking the Potential of Intermediate Layers of Large Language Models for Dynamic Inference Parsa Kavehzadeh Mojtaba Valipour Marzieh S. Tahaei Ali Ghodsi Boxing Chen Mehdi Rezagholizadeh 22 6 0 16 Sep 2023