Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2403.00858
Cited By
Direct Alignment of Draft Model for Speculative Decoding with Chat-Fine-Tuned LLMs
29 February 2024
Raghavv Goel
Mukul Gagrani
Wonseok Jeon
Junyoung Park
Mingu Lee
Christopher Lott
ALM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Direct Alignment of Draft Model for Speculative Decoding with Chat-Fine-Tuned LLMs"
4 / 4 papers shown
Title
KeyDiff: Key Similarity-Based KV Cache Eviction for Long-Context LLM Inference in Resource-Constrained Environments
Junyoung Park
Dalton Jones
Matt Morse
Raghavv Goel
Mingu Lee
Chris Lott
22
0
0
21 Apr 2025
Closer Look at Efficient Inference Methods: A Survey of Speculative Decoding
Hyun Ryu
Eric Kim
72
3
0
20 Nov 2024
AdaEDL: Early Draft Stopping for Speculative Decoding of Large Language Models via an Entropy-based Lower Bound on Token Acceptance Probability
Sudhanshu Agrawal
Wonseok Jeon
Mingu Lee
16
0
0
24 Oct 2024
On Speculative Decoding for Multimodal Large Language Models
Mukul Gagrani
Raghavv Goel
Wonseok Jeon
Junyoung Park
Mingu Lee
Christopher Lott
LRM
19
6
0
13 Apr 2024
1