v1v2v3v4 (latest)

Speculative Decoding with Big Little Decoder

Neural Information Processing Systems (NeurIPS), 2023

15 February 2023

Sehoon Kim

Suhong Moon

Papers citing "Speculative Decoding with Big Little Decoder"

50 / 103 papers shown

Scaling LLM Speculative Decoding: Non-Autoregressive Forecasting in Large-Batch Scenarios

189

25 Nov 2025

When, What, and How: Rethinking Retrieval-Enhanced Speculative Decoding

109

03 Nov 2025

Reject Only Critical Tokens: Pivot-Aware Speculative Decoding

Sai Praneeth Karimireddy

Salman Avestimehr

113

01 Nov 2025

Polybasic Speculative Decoding Through a Theoretical Perspective

236

30 Oct 2025

Batch Speculative Decoding Done Right

103

26 Oct 2025

FastVLM: Self-Speculative Decoding for Fast Vision-Language Model Inference

Divya J. Bajpai

M. Hanawal

MLLM VLM

211

26 Oct 2025

Fast Inference via Hierarchical Speculative Decoding

194

22 Oct 2025

Mirror Speculative Decoding: Breaking the Serial Barrier in LLM Inference

332

15 Oct 2025

MoBiLE: Efficient Mixture-of-Experts Inference on Consumer GPU with Mixture of Big Little Experts

169

14 Oct 2025

A Survey on Collaborating Small and Large Language Models for Performance, Cost-effectiveness, Cloud-edge Privacy, and Trustworthiness

186

14 Oct 2025

SwiReasoning: Switch-Thinking in Latent and Explicit for Pareto-Superior Reasoning LLMs

154

06 Oct 2025

Staircase Streaming for Low-Latency Multi-Agent Inference

182

06 Oct 2025

Beyond Next-Token Prediction: A Performance Characterization of Diffusion versus Autoregressive Language Models

174

05 Oct 2025

Bridging Draft Policy Misalignment: Group Tree Optimization for Speculative Decoding

142

26 Sep 2025

SelfJudge: Faster Speculative Decoding via Self-Supervised Judge Verification

122

26 Sep 2025

ATTS: Asynchronous Test-Time Scaling via Conformal Prediction

...

207

18 Sep 2025

FastMTP: Accelerating LLM Inference with Enhanced Multi-Token Prediction

170

16 Sep 2025

SpecVLM: Enhancing Speculative Decoding of Video LLMs via Verifier-Guided Token Pruning

243

22 Aug 2025

Confidence-Modulated Speculative Decoding for Large Language Models

297

21 Aug 2025

Novel Parasitic Dual-Scale Modeling for Efficient and Accurate Multilingual Speech Translation

15 Aug 2025

SmallKV: Small Model Assisted Compensation of KV Cache Compression for Efficient LLM Inference

216

03 Aug 2025

Mixture-of-Recursions: Learning Dynamic Recursive Depths for Adaptive Token-Level Computation

...

297

14 Jul 2025

OmniDraft: A Cross-vocabulary, Online Adaptive Drafter for On-device Speculative Decoding

347

03 Jul 2025

TagRouter: Learning Route to LLMs through Tags for Open-Domain Text Generation TasksAnnual Meeting of the Association for Computational Linguistics (ACL), 2025

178

14 Jun 2025

Fast ECoT: Efficient Embodied Chain-of-Thought via Thoughts Reuse

277

09 Jun 2025

AdaDecode: Accelerating LLM Decoding with Adaptive Layer Parallelism

309

04 Jun 2025

Think Before You Accept: Semantic Reflective Verification for Faster Speculative Decoding

300

24 May 2025

KNN-SSD: Enabling Dynamic Self-Speculative Decoding via Nearest Neighbor Layer Set Optimization

191

22 May 2025

The Energy Cost of Reasoning: Analyzing Energy Usage in LLMs with Test-time Compute

426

20 May 2025

Alignment-Augmented Speculative Decoding with Alignment Sampling and Conditional Verification

296

19 May 2025

Accelerating Adaptive Retrieval Augmented Generation via Instruction-Driven Representation Reduction of Retrieval OverlapsAnnual Meeting of the Association for Computational Linguistics (ACL), 2025

519

19 May 2025

Automatic Task Detection and Heterogeneous LLM Speculative Decoding

231

13 May 2025

Efficient Reasoning for LLMs through Speculative Chain-of-Thought

355

27 Apr 2025

Towards Harnessing the Collaborative Power of Large and Small Models for Domain Tasks

...

1.0K

24 Apr 2025

HELIOS: Adaptive Model And Early-Exit Selection for Efficient LLM Inference Serving

467

14 Apr 2025

Understanding and Optimizing Multi-Stage AI Inference Pipelines

1.0K

14 Apr 2025

The Other Side of the Coin: Exploring Fairness in Retrieval-Augmented Generation

302

11 Apr 2025

SpecPipe: Accelerating Pipeline Parallelism-based LLM Inference with Speculative Decoding

357

05 Apr 2025

Token-Driven GammaTune: Adaptive Calibration for Enhanced Speculative Decoding

Aayush Gautam

Susav Shrestha

Narasimha Annapareddy

491

28 Mar 2025

A Survey of Efficient Reasoning for Large Reasoning Models: Language, Multimodality, and Beyond

...

665

104

27 Mar 2025

PCM : Picard Consistency Model for Fast Parallel Sampling of Diffusion ModelsComputer Vision and Pattern Recognition (CVPR), 2025

342

25 Mar 2025

Beyond Decoder-only: Large Language Models Can be Good Encoders for Machine TranslationAnnual Meeting of the Association for Computational Linguistics (ACL), 2025

...

1.1K

09 Mar 2025

AdaSpec: Adaptive Speculative Decoding for Fast, SLO-Aware Large Language Model Serving

295

07 Mar 2025

DSVD: Dynamic Self-Verify Decoding for Faithful Generation in Large Language Models

306

05 Mar 2025

DuoDecoding: Hardware-aware Heterogeneous Speculative Decoding with Dynamic Multi-Sequence Drafting

308

02 Mar 2025

Tutorial Proposal: Speculative Decoding for Efficient LLM Inference

309

01 Mar 2025

Fuzzy Speculative Decoding for a Tunable Accuracy-Runtime TradeoffAnnual Meeting of the Association for Computational Linguistics (ACL), 2025

Maximilian Holsman

Yukun Huang

Bhuwan Dhingra

585

28 Feb 2025

Speculative Decoding and Beyond: An In-Depth Survey of Techniques

754

27 Feb 2025

TETRIS: Optimal Draft Token Selection for Batch Speculative DecodingAnnual Meeting of the Association for Computational Linguistics (ACL), 2025

Bryan Kian Hsiang Low

352

21 Feb 2025

KVLink: Accelerating Large Language Models via Efficient KV Cache Reuse

751

21 Feb 2025