ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2505.16162
42
0

KNN-SSD: Enabling Dynamic Self-Speculative Decoding via Nearest Neighbor Layer Set Optimization

22 May 2025
Mingbo Song
Heming Xia
Jun Zhang
Chak Tou Leong
Qiancheng Xu
Wenjie Li
Sujian Li
ArXiv (abs)PDFHTML
Main:8 Pages
10 Figures
Bibliography:3 Pages
6 Tables
Appendix:3 Pages
Abstract

Speculative Decoding (SD) has emerged as a widely used paradigm to accelerate the inference of large language models (LLMs) without compromising generation quality. It works by efficiently drafting multiple tokens using a compact model and then verifying them in parallel using the target LLM. Notably, Self-Speculative Decoding proposes skipping certain layers to construct the draft model, which eliminates the need for additional parameters or training. Despite its strengths, we observe in this work that drafting with layer skipping exhibits significant sensitivity to domain shifts, leading to a substantial drop in acceleration performance. To enhance the domain generalizability of this paradigm, we introduce KNN-SSD, an algorithm that leverages K-Nearest Neighbor (KNN) search to match different skipped layers with various domain inputs. We evaluated our algorithm in various models and multiple tasks, observing that its application leads to 1.3x-1.6x speedup in LLM inference.

View on arXiv
@article{song2025_2505.16162,
  title={ KNN-SSD: Enabling Dynamic Self-Speculative Decoding via Nearest Neighbor Layer Set Optimization },
  author={ Mingbo Song and Heming Xia and Jun Zhang and Chak Tou Leong and Qiancheng Xu and Wenjie Li and Sujian Li },
  journal={arXiv preprint arXiv:2505.16162},
  year={ 2025 }
}
Comments on this paper