Direct Alignment of Draft Model for Speculative Decoding with Chat-Fine-Tuned LLMs

29 February 2024

Papers citing "Direct Alignment of Draft Model for Speculative Decoding with Chat-Fine-Tuned LLMs"

4 / 4 papers shown

Title
KeyDiff: Key Similarity-Based KV Cache Eviction for Long-Context LLM Inference in Resource-Constrained Environments Junyoung Park Dalton Jones Matt Morse Raghavv Goel Mingu Lee Chris Lott 22 0 0 21 Apr 2025
Closer Look at Efficient Inference Methods: A Survey of Speculative Decoding Hyun Ryu Eric Kim 72 3 0 20 Nov 2024
AdaEDL: Early Draft Stopping for Speculative Decoding of Large Language Models via an Entropy-based Lower Bound on Token Acceptance Probability Sudhanshu Agrawal Wonseok Jeon Mingu Lee 16 0 0 24 Oct 2024
On Speculative Decoding for Multimodal Large Language Models Mukul Gagrani Raghavv Goel Wonseok Jeon Junyoung Park Mingu Lee Christopher Lott LRM 19 6 0 13 Apr 2024