SWIFT: On-the-Fly Self-Speculative Decoding for LLM Inference Acceleration

9 October 2024

Yongqi Li

Wenjie Li

Papers citing "SWIFT: On-the-Fly Self-Speculative Decoding for LLM Inference Acceleration"

3 / 3 papers shown

Title
PARD: Accelerating LLM Inference with Low-Cost PARallel Draft Model Adaptation Zihao An Huajun Bai Z. Liu Dong Li E. Barsoum 48 0 0 23 Apr 2025
DEL: Context-Aware Dynamic Exit Layer for Efficient Self-Speculative Decoding Hossein Entezari Zarch Lei Gao Chaoyi Jiang Murali Annavaram LRM 21 0 0 08 Apr 2025
Speculative Decoding and Beyond: An In-Depth Survey of Techniques Y. Hu Zining Liu Zhenyuan Dong Tianfan Peng Bradley McDanel S. Zhang 82 0 0 27 Feb 2025