v1v2 (latest)

Fast Inference from Transformers via Speculative Decoding

International Conference on Machine Learning (ICML), 2022

30 November 2022

Yaniv Leviathan

Matan Kalman

Yossi Matias

LRM

ArXiv (abs)PDF HTML HuggingFace (9 upvotes)

Papers citing "Fast Inference from Transformers via Speculative Decoding"

50 / 763 papers shown

FastVLM: Self-Speculative Decoding for Fast Vision-Language Model Inference

Divya J. Bajpai

M. Hanawal

MLLM VLM

211

26 Oct 2025

Memory-based Language Models: An Efficient, Explainable, and Eco-friendly Approach to Large Language Modeling

25 Oct 2025

Parallel Sampling from Masked Diffusion Models via Conditional Independence Testing

24 Oct 2025

Small Drafts, Big Verdict: Information-Intensive Visual Reasoning via Speculation

200

23 Oct 2025

Fast Inference via Hierarchical Speculative Decoding

190

22 Oct 2025

No Compute Left Behind: Rethinking Reasoning and Sampling with Masked Diffusion Models

Zachary Horvitz

Raghav Singhal

Hao Zou

Carles Domingo-Enrich

149

22 Oct 2025

AdaSPEC: Selective Knowledge Distillation for Efficient Speculative Decoders

100

22 Oct 2025

Not-a-Bandit: Provably No-Regret Drafter Selection in Speculative Decoding for LLMs

138

22 Oct 2025

EdgeReasoning: Characterizing Reasoning LLM Deployment on Edge GPUs

Benjamin Kubwimana

Qijing Huang

LRM

113

21 Oct 2025

Contrastive Decoding Mitigates Score Range Bias in LLM-as-a-Judge

Yoshinari Fujinuma

ELM

109

21 Oct 2025

132

20 Oct 2025

Efficient Vision-Language-Action Models for Embodied Manipulation: A Systematic Survey

365

20 Oct 2025

What Limits Agentic Systems Efficiency?

Shivaram Venkataraman

LLMAG LRM

143

18 Oct 2025

TokenTiming: A Dynamic Alignment Method for Universal Speculative Decoding Model Pairs

165

17 Oct 2025

Efficient Parallel Samplers for Recurrent-Depth Models and Their Connection to Diffusion Language Models

Jonas Geiping

Xinyu Yang

Guinan Su

121

16 Oct 2025

Kelle: Co-design KV Caching and eDRAM for Efficient LLM Serving in Edge Computing

Tianhua Xia

Sai Qian Zhang

16 Oct 2025

Mirror Speculative Decoding: Breaking the Serial Barrier in LLM Inference

329

15 Oct 2025

On the Reasoning Abilities of Masked Diffusion Language Models

Anej Svete

Ashish Sabharwal

DiffM LRM

111

15 Oct 2025

Breadcrumbs Reasoning: Memory-Efficient Reasoning with Compression Beacons

193

15 Oct 2025

A Survey on Parallel Reasoning

...

181

14 Oct 2025

A Survey on Collaborating Small and Large Language Models for Performance, Cost-effectiveness, Cloud-edge Privacy, and Trustworthiness

185

14 Oct 2025

3-Model Speculative Decoding

14 Oct 2025

DND: Boosting Large Language Models with Dynamic Nested Depth

234

13 Oct 2025

Direct Multi-Token Decoding

103

13 Oct 2025

DynaSpec: Context-aware Dynamic Speculative Sampling for Large-Vocabulary Language Models

133

11 Oct 2025

Conformal Sparsification for Bandwidth-Efficient Edge-Cloud Speculative Decoding

126

11 Oct 2025

Towards Better & Faster Autoregressive Image Generation: From the Perspective of Entropy

176

10 Oct 2025

ProxRouter: Proximity-Weighted LLM Query Routing for Improved Robustness to Outliers

132

10 Oct 2025

Logit Arithmetic Elicits Long Reasoning Capabilities Without Training

Xinliang Frederick Zhang

Farima Fatahi Bayat

L. Wang

RALM LRM

103

10 Oct 2025

Speculative Jacobi-Denoising Decoding for Accelerating Autoregressive Text-to-image Generation

132

10 Oct 2025

Placeit! A Framework for Learning Robot Object Placement Skills

120

10 Oct 2025

Scaling Laws for Code: A More Data-Hungry Regime

111

09 Oct 2025

AdaSwitch: Adaptive Switching Generation for Knowledge Distillation

09 Oct 2025

Lossless Vocabulary Reduction for Auto-Regressive Language Models

104

09 Oct 2025

Beyond independent component analysis: identifiability and algorithms

08 Oct 2025

lm-Meter: Unveiling Runtime Inference Latency for On-Device Language Models

114

07 Oct 2025

Staircase Streaming for Low-Latency Multi-Agent Inference

182

06 Oct 2025

SwiReasoning: Switch-Thinking in Latent and Explicit for Pareto-Superior Reasoning LLMs

146

06 Oct 2025

Draft, Verify, and Improve: Toward Training-Aware Speculative Decoding

Shrenik Bhansali

Larry Heck

OffRL

06 Oct 2025

Drax: Speech Recognition with Discrete Flow Matching

130

05 Oct 2025

Speculative Actions: A Lossless Framework for Faster Agentic Systems

188

05 Oct 2025

Self Speculative Decoding for Diffusion Large Language Models

320

05 Oct 2025

Beyond Next-Token Prediction: A Performance Characterization of Diffusion versus Autoregressive Language Models

168

05 Oct 2025

Self-Speculative Masked Diffusions

155

04 Oct 2025

Action Deviation-Aware Inference for Low-Latency Wireless Robots

171

03 Oct 2025

The Disparate Impacts of Speculative Decoding

132

02 Oct 2025

FlashResearch: Real-time Agent Orchestration for Efficient Deep Research

121

02 Oct 2025

Optimal Stopping vs Best-of-

N

for Inference Time Optimization

Y. Kalayci

Vinod Raman

S. Dughmi

122

01 Oct 2025

HiSpec: Hierarchical Speculative Decoding for LLMs

Avinash Kumar

Sujay Sanghavi

Poulami Das

119

01 Oct 2025

Free Draft-and-Verification: Toward Lossless Parallel Decoding for Diffusion Large Language Models

Shutong Wu

Jiawei Zhang

DiffM

314

30 Sep 2025