v1v2 (latest)

Fast Inference from Transformers via Speculative Decoding

International Conference on Machine Learning (ICML), 2022

30 November 2022

Yaniv Leviathan

Matan Kalman

Yossi Matias

LRM

ArXiv (abs)PDF HTML HuggingFace (9 upvotes)

Papers citing "Fast Inference from Transformers via Speculative Decoding"

50 / 763 papers shown

Fast LLM Post-training via Decoupled and Fastest-of-N Speculation

...

437

24 Dec 2025

SpecPV: Improving Self-Speculative Decoding for Long-Context Generation via Partial Verification

119

02 Dec 2025

Accelerating Large-Scale Reasoning Model Inference with Sparse Self-Speculative Decoding

...

Mohamed S. Abdelfattah

190

01 Dec 2025

Thinking with Drafts: Speculative Temporal Reasoning for Efficient Long Video Understanding

143

30 Nov 2025

DSD: A Distributed Speculative Decoding Solution for Edge-Cloud Agile Large Model Serving

208

26 Nov 2025

DiFR: Inference Verification Despite Nondeterminism

102

25 Nov 2025

Reducing Latency of LLM Search Agent via Speculation-based Algorithm-System Co-Design

...

160

25 Nov 2025

FREE: Uncertainty-Aware Autoregression for Parallel Diffusion Transformers

126

25 Nov 2025

A note on the impossibility of conditional PAC-efficient reasoning in large language models

Hao Zeng

LRM

25 Nov 2025

Orchestrating Dual-Boundaries: An Arithmetic Intensity Inspired Acceleration Framework for Diffusion Language Models

106

24 Nov 2025

CDLM: Consistency Diffusion Language Models For Faster Sampling

198

24 Nov 2025

NEZHA: A Zero-sacrifice and Hyperspeed Decoding Architecture for Generative Recommendations

...

172

24 Nov 2025

Sphinx: Efficiently Serving Novel View Synthesis using Regression-Guided Selective Refinement

186

24 Nov 2025

WavefrontDiffusion: Dynamic Decoding Schedule for Improved Reasoning

22 Nov 2025

Accelerating Time Series Foundation Models with Speculative Decoding

252

22 Nov 2025

Taming the Long-Tail: Efficient Reasoning RL Training with Adaptive Drafter

122

20 Nov 2025

Global Resolution: Optimal Multi-Draft Speculative Sampling via Convex Minimization

Rahul Thomas

Arka Pal

111

19 Nov 2025

Beat the long tail: Distribution-Aware Speculative Decoding for RL Training

...

190

17 Nov 2025

Global Cross-Time Attention Fusion for Enhanced Solar Flare Prediction from Multivariate Time Series

135

17 Nov 2025

CSV-Decode: Certifiable Sub-Vocabulary Decoding for Efficient Large Language Model Inference

Dong Liu

Yanxuan Yu

Ben Lengerich

16 Nov 2025

Cacheback: Speculative Decoding With Nothing But Cache

196

15 Nov 2025

Speculative Decoding in Decentralized LLM Inference: Turning Communication Latency into Computation Throughput

13 Nov 2025

Steering Pretrained Drafters during Speculative Decoding

437

13 Nov 2025

Test-time Diverse Reasoning by Riemannian Activation Steering

286

11 Nov 2025

ConvFill: Model Collaboration for Responsive Conversational Voice Agents

10 Nov 2025

Confidence-Guided Stepwise Model Routing for Cost-Efficient Reasoning

147

09 Nov 2025

Verifying LLM Inference to Detect Model Weight Exfiltration

123

04 Nov 2025

Collaborative Large Language Model Inference via Resource-Aware Parallel Speculative Decoding

Jungyeon Koh

H. Yang

120

03 Nov 2025

Democratizing LLM Efficiency: From Hyperscale Optimizations to Universal Deployability

Hen-Hsen Huang

03 Nov 2025

TapOut: A Bandit-Based Approach to Dynamic Speculative Decoding

112

03 Nov 2025

When, What, and How: Rethinking Retrieval-Enhanced Speculative Decoding

109

03 Nov 2025

FlexiCache: Leveraging Temporal Stability of Attention Heads for Efficient KV Cache Management

Nazmul Takbir

Hamidreza Alikhani

N. Dutt

Sangeetha Abdu Jyothi

02 Nov 2025

Reject Only Critical Tokens: Pivot-Aware Speculative Decoding

Sai Praneeth Karimireddy

Salman Avestimehr

109

01 Nov 2025

SpecDiff-2: Scaling Diffusion Drafter Alignment For Faster Speculative Decoding

177

01 Nov 2025

SpecAttn: Speculating Sparse Attention

Harsh Shah

100

31 Oct 2025

Continuous Autoregressive Language Models

318

31 Oct 2025

ReSpec: Towards Optimizing Speculative Decoding in Reinforcement Learning Systems

104

30 Oct 2025

CAS-Spec: Cascade Adaptive Self-Speculative Decoding for On-the-Fly Lossless Inference Acceleration of LLMs

123

30 Oct 2025

Kad: A Framework for Proxy-based Test-time Alignment with Knapsack Approximation Deferral

Ayoub Hammal

Pierre Zweigenbaum

Caio Corro

238

30 Oct 2025

Polybasic Speculative Decoding Through a Theoretical Perspective

236

30 Oct 2025

The End of Manual Decoding: Towards Truly End-to-End Language Models

417

30 Oct 2025

Hawk: Leveraging Spatial Context for Faster Autoregressive Text-to-Image Generation

137

29 Oct 2025

NetEcho: From Real-World Streaming Side-Channels to Full LLM Conversation Recovery

140

29 Oct 2025

SelecTKD: Selective Token-Weighted Knowledge Distillation for LLMs

137

28 Oct 2025

MC-SJD : Maximal Coupling Speculative Jacobi Decoding for Autoregressive Visual Generation Acceleration

132

28 Oct 2025

BitSkip: An Empirical Analysis of Quantization and Early Exit Composition

Ramshankar Bhuvaneswaran

Handan Liu

249

27 Oct 2025

Rethinking Inference Placement for Deep Learning across Edge and Cloud Platforms: A Multi-Objective Optimization Perspective and Future Directions

Zongshun Zhang

I. Matta

120

27 Oct 2025

FastVLM: Self-Speculative Decoding for Fast Vision-Language Model Inference

Divya J. Bajpai

M. Hanawal

MLLM VLM

211

26 Oct 2025

Encoder-Decoder Diffusion Language Models for Efficient Training and Inference

143

26 Oct 2025

Chitchat with AI: Understand the supply chain carbon disclosure of companies worldwide through Large Language Model

26 Oct 2025