Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales

Terms and Conditions

Twitter GitHub LinkedIn Bluesky Youtube

© 2026 ResearchTrend.AI, All rights reserved.

Home
Papers
2302.01318
Cited By

Accelerating Large Language Model Decoding with Speculative Sampling

Accelerating Large Language Model Decoding with Speculative Sampling

2 February 2023

Charlie Chen

Sebastian Borgeaud

Jean-Baptiste Lespiau

ArXiv (abs)PDF HTML HuggingFace (3 upvotes)Github

Papers citing "Accelerating Large Language Model Decoding with Speculative Sampling"

50 / 460 papers shown

Planned Diffusion

Planned Diffusion

Guy Van den Broeck

Suvinay Subramanian

208

5

0

27 Mar 2026

Fast LLM Post-training via Decoupled and Fastest-of-N Speculation

Fast LLM Post-training via Decoupled and Fastest-of-N Speculation

...

521

0

0

24 Dec 2025

Accelerating Large-Scale Reasoning Model Inference with Sparse Self-Speculative Decoding

Accelerating Large-Scale Reasoning Model Inference with Sparse Self-Speculative Decoding

...

Mohamed S. Abdelfattah

265

1

0

01 Dec 2025

Thinking with Drafts: Speculative Temporal Reasoning for Efficient Long Video Understanding

Thinking with Drafts: Speculative Temporal Reasoning for Efficient Long Video Understanding

190

1

0

30 Nov 2025

Training-Free Loosely Speculative Decoding: Accepting Semantically Correct Drafts Beyond Exact Match

Training-Free Loosely Speculative Decoding: Accepting Semantically Correct Drafts Beyond Exact Match

131

2

0

28 Nov 2025

DSD: A Distributed Speculative Decoding Solution for Edge-Cloud Agile Large Model Serving

DSD: A Distributed Speculative Decoding Solution for Edge-Cloud Agile Large Model Serving

327

2

0

26 Nov 2025

DiFR: Inference Verification Despite Nondeterminism

DiFR: Inference Verification Despite Nondeterminism

Adrià Garriga-Alonso

145

1

0

25 Nov 2025

Orchestrating Dual-Boundaries: An Arithmetic Intensity Inspired Acceleration Framework for Diffusion Language Models

Orchestrating Dual-Boundaries: An Arithmetic Intensity Inspired Acceleration Framework for Diffusion Language Models

148

3

0

24 Nov 2025

Global Resolution: Optimal Multi-Draft Speculative Sampling via Convex Minimization

Global Resolution: Optimal Multi-Draft Speculative Sampling via Convex Minimization

156

1

0

19 Nov 2025

FlashMesh: Faster and Better Autoregressive Mesh Synthesis via Structured Speculation

FlashMesh: Faster and Better Autoregressive Mesh Synthesis via Structured Speculation

436

0

0

19 Nov 2025

Steering Pretrained Drafters during Speculative Decoding

Steering Pretrained Drafters during Speculative Decoding

Frédéric Berdoz

Peer Rheinboldt

Roger Wattenhofer

504

0

0

13 Nov 2025

Verifying LLM Inference to Detect Model Weight Exfiltration

Verifying LLM Inference to Detect Model Weight Exfiltration

177

1

0

04 Nov 2025

When, What, and How: Rethinking Retrieval-Enhanced Speculative Decoding

When, What, and How: Rethinking Retrieval-Enhanced Speculative Decoding

153

0

0

03 Nov 2025

Democratizing LLM Efficiency: From Hyperscale Optimizations to Universal Deployability

Democratizing LLM Efficiency: From Hyperscale Optimizations to Universal Deployability

122

0

0

03 Nov 2025

Collaborative Large Language Model Inference via Resource-Aware Parallel Speculative Decoding

Collaborative Large Language Model Inference via Resource-Aware Parallel Speculative Decoding

171

0

0

03 Nov 2025

TapOut: A Bandit-Based Approach to Dynamic Speculative Decoding

TapOut: A Bandit-Based Approach to Dynamic Speculative Decoding

Nish Sinnadurai

Vithursan Thangarasa

148

0

0

03 Nov 2025

Reject Only Critical Tokens: Pivot-Aware Speculative Decoding

Reject Only Critical Tokens: Pivot-Aware Speculative Decoding

Amir Ziashahabi

Yavuz Faruk Bakman

Mostafa El-Khamy

Sai Praneeth Karimireddy

Salman Avestimehr

137

1

0

01 Nov 2025

SpecDiff-2: Scaling Diffusion Drafter Alignment For Faster Speculative Decoding

SpecDiff-2: Scaling Diffusion Drafter Alignment For Faster Speculative Decoding

Jameson Sandler

Jacob K Christopher

Thomas Hartvigsen

Ferdinando Fioretto

256

5

0

01 Nov 2025

SpecAttn: Speculating Sparse Attention

SpecAttn: Speculating Sparse Attention

163

0

0

31 Oct 2025

Polybasic Speculative Decoding Through a Theoretical Perspective

Polybasic Speculative Decoding Through a Theoretical Perspective

277

0

0

30 Oct 2025

CAS-Spec: Cascade Adaptive Self-Speculative Decoding for On-the-Fly Lossless Inference Acceleration of LLMs

CAS-Spec: Cascade Adaptive Self-Speculative Decoding for On-the-Fly Lossless Inference Acceleration of LLMs

172

0

0

30 Oct 2025

ReSpec: Towards Optimizing Speculative Decoding in Reinforcement Learning Systems

ReSpec: Towards Optimizing Speculative Decoding in Reinforcement Learning Systems

125

4

0

30 Oct 2025

The End of Manual Decoding: Towards Truly End-to-End Language Models

The End of Manual Decoding: Towards Truly End-to-End Language Models

476

4

0

30 Oct 2025

Kad: A Framework for Proxy-based Test-time Alignment with Knapsack Approximation Deferral

Kad: A Framework for Proxy-based Test-time Alignment with Knapsack Approximation Deferral

Pierre Zweigenbaum

278

0

0

30 Oct 2025

Hawk: Leveraging Spatial Context for Faster Autoregressive Text-to-Image Generation

Hawk: Leveraging Spatial Context for Faster Autoregressive Text-to-Image Generation

163

1

0

29 Oct 2025

SelecTKD: Selective Token-Weighted Knowledge Distillation for LLMs

SelecTKD: Selective Token-Weighted Knowledge Distillation for LLMs

Jiangcheng Song

195

0

0

28 Oct 2025

MC-SJD : Maximal Coupling Speculative Jacobi Decoding for Autoregressive Visual Generation Acceleration

MC-SJD : Maximal Coupling Speculative Jacobi Decoding for Autoregressive Visual Generation Acceleration

164

1

0

28 Oct 2025

Rethinking Inference Placement for Deep Learning across Edge and Cloud Platforms: A Multi-Objective Optimization Perspective and Future Directions

Rethinking Inference Placement for Deep Learning across Edge and Cloud Platforms: A Multi-Objective Optimization Perspective and Future Directions

167

1

0

27 Oct 2025

Encoder-Decoder Diffusion Language Models for Efficient Training and Inference

Encoder-Decoder Diffusion Language Models for Efficient Training and Inference

Marianne Arriola

Volodymyr Kuleshov

196

10

0

26 Oct 2025

Batch Speculative Decoding Done Right

Batch Speculative Decoding Done Right

Ranran Haoran Zhang

Ashirbad Mishra

186

0

0

26 Oct 2025

FastVLM: Self-Speculative Decoding for Fast Vision-Language Model Inference

FastVLM: Self-Speculative Decoding for Fast Vision-Language Model Inference

Divya J. Bajpai

254

0

0

26 Oct 2025

AdaSPEC: Selective Knowledge Distillation for Efficient Speculative Decoders

AdaSPEC: Selective Knowledge Distillation for Efficient Speculative Decoders

136

5

0

22 Oct 2025

Not-a-Bandit: Provably No-Regret Drafter Selection in Speculative Decoding for LLMs

Not-a-Bandit: Provably No-Regret Drafter Selection in Speculative Decoding for LLMs

178

2

0

22 Oct 2025

No Compute Left Behind: Rethinking Reasoning and Sampling with Masked Diffusion Models

No Compute Left Behind: Rethinking Reasoning and Sampling with Masked Diffusion Models

Zachary Horvitz

Carles Domingo-Enrich

Rajesh Ranganath

Kathleen McKeown

195

3

0

22 Oct 2025

Fast Inference via Hierarchical Speculative Decoding

Fast Inference via Hierarchical Speculative Decoding

219

0

0

22 Oct 2025

Test-time Verification via Optimal Transport: Coverage, ROC, & Sub-optimality

Test-time Verification via Optimal Transport: Coverage, ROC, & Sub-optimality

Arpan Mukherjee

164

1

0

21 Oct 2025

EdgeReasoning: Characterizing Reasoning LLM Deployment on Edge GPUs

EdgeReasoning: Characterizing Reasoning LLM Deployment on Edge GPUs

Benjamin Kubwimana

142

2

0

21 Oct 2025

Reasoning Language Model Inference Serving Unveiled: An Empirical Study

Reasoning Language Model Inference Serving Unveiled: An Empirical Study

325

1

0

21 Oct 2025

What Limits Agentic Systems Efficiency?

What Limits Agentic Systems Efficiency?

Anand Jayarajan

Gennady Pekhimenko

Shivaram Venkataraman

220

1

0

18 Oct 2025

When to Ensemble: Identifying Token-Level Points for Stable and Fast LLM Ensembling

When to Ensemble: Identifying Token-Level Points for Stable and Fast LLM Ensembling

193

0

0

17 Oct 2025

Synera: Synergistic LLM Serving across Device and Cloud at Scale

Synera: Synergistic LLM Serving across Device and Cloud at Scale

151

0

0

17 Oct 2025

TokenTiming: A Dynamic Alignment Method for Universal Speculative Decoding Model Pairs

TokenTiming: A Dynamic Alignment Method for Universal Speculative Decoding Model Pairs

231

0

0

17 Oct 2025

Mirror Speculative Decoding: Breaking the Serial Barrier in LLM Inference

Mirror Speculative Decoding: Breaking the Serial Barrier in LLM Inference

Nikhil Bhendawade

Irina Belousova

423

1

0

15 Oct 2025

Breadcrumbs Reasoning: Memory-Efficient Reasoning with Compression Beacons

Breadcrumbs Reasoning: Memory-Efficient Reasoning with Compression Beacons

Shankar Padmanabhan

Kianté Brantley

237

1

0

15 Oct 2025

3-Model Speculative Decoding

3-Model Speculative Decoding

Woo Seong Chung

141

0

0

14 Oct 2025

A Survey on Parallel Reasoning

A Survey on Parallel Reasoning

...

222

5

0

14 Oct 2025

A Survey on Collaborating Small and Large Language Models for Performance, Cost-effectiveness, Cloud-edge Privacy, and Trustworthiness

A Survey on Collaborating Small and Large Language Models for Performance, Cost-effectiveness, Cloud-edge Privacy, and Trustworthiness

237

4

0

14 Oct 2025

DynaSpec: Context-aware Dynamic Speculative Sampling for Large-Vocabulary Language Models

DynaSpec: Context-aware Dynamic Speculative Sampling for Large-Vocabulary Language Models

Erik Schultheis

190

1

0

11 Oct 2025

Placeit! A Framework for Learning Robot Object Placement Skills

Placeit! A Framework for Learning Robot Object Placement Skills

Francois Helenon

Mahdi Khoramshahi

Stéphane Doncieux

172

2

0

10 Oct 2025

Towards Better & Faster Autoregressive Image Generation: From the Perspective of Entropy

Towards Better & Faster Autoregressive Image Generation: From the Perspective of Entropy

228

5

0

10 Oct 2025

1 2 3 4...8 9 10

Page 1 of 10