Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2211.17192
Cited By
v1
v2 (latest)
Fast Inference from Transformers via Speculative Decoding
International Conference on Machine Learning (ICML), 2022
30 November 2022
Yaniv Leviathan
Matan Kalman
Yossi Matias
LRM
Re-assign community
ArXiv (abs)
PDF
HTML
HuggingFace (9 upvotes)
Papers citing
"Fast Inference from Transformers via Speculative Decoding"
50 / 763 papers shown
Fast LLM Post-training via Decoupled and Fastest-of-N Speculation
Rongxin Cheng
Kai Zhou
Xingda Wei
Siyuan Liu
Mingcong Han
...
Yeju Zhou
Baoquan Zhong
W. L. Xiao
Rong Chen
Haibo Chen
OffRL
LRM
437
0
0
24 Dec 2025
SpecPV: Improving Self-Speculative Decoding for Long-Context Generation via Partial Verification
Zhendong Tan
Xingjun Zhang
Chaoyi Hu
Junjie Peng
Kun Xia
LRM
119
0
0
02 Dec 2025
Accelerating Large-Scale Reasoning Model Inference with Sparse Self-Speculative Decoding
Yilong Zhao
Jiaming Tang
Kan Zhu
Zihao Ye
Chi-chih Chang
...
Mohamed S. Abdelfattah
Mingyu Gao
Baris Kasikci
Song Han
Ion Stoica
ReLM
LRM
190
1
0
01 Dec 2025
Thinking with Drafts: Speculative Temporal Reasoning for Efficient Long Video Understanding
Pengfei Hu
Meng Cao
Y. Wang
Yi Wang
Jiahua Dong
Jun Song
Yu Cheng
Bo Zheng
Xiaodan Liang
LRM
VLM
143
0
0
30 Nov 2025
DSD: A Distributed Speculative Decoding Solution for Edge-Cloud Agile Large Model Serving
Fengze Yu
Leshu Li
Brad McDanel
Sai Qian Zhang
208
1
0
26 Nov 2025
DiFR: Inference Verification Despite Nondeterminism
Adam Karvonen
Daniel Reuter
Roy Rinberg
Luke Marks
Adrià Garriga-Alonso
Keri Warr
102
0
0
25 Nov 2025
Reducing Latency of LLM Search Agent via Speculation-based Algorithm-System Co-Design
Zixiao Huang
Wen Zeng
Tianyu Fu
Tengxuan Liu
Yizhou Sun
...
Y. Li
Quanlu Zhang
Guohao Dai
Zhenhua Zhu
Yu Wang
LRM
160
0
0
25 Nov 2025
FREE: Uncertainty-Aware Autoregression for Parallel Diffusion Transformers
Xinwan Wen
Bowen Li
Jiajun Luo
Ye Li
Zhi Wang
VGen
126
0
0
25 Nov 2025
A note on the impossibility of conditional PAC-efficient reasoning in large language models
Hao Zeng
LRM
72
0
0
25 Nov 2025
Orchestrating Dual-Boundaries: An Arithmetic Intensity Inspired Acceleration Framework for Diffusion Language Models
Linye Wei
Wenjue Chen
Pingzhi Tang
Xiaotian Guo
Le Ye
Runsheng Wang
Meng Li
AI4CE
106
0
0
24 Nov 2025
CDLM: Consistency Diffusion Language Models For Faster Sampling
Minseo Kim
Chenfeng Xu
Coleman Hooper
Harman Singh
Ben Athiwaratkun
Ce Zhang
Kurt Keutzer
Amir Gholami
198
0
0
24 Nov 2025
NEZHA: A Zero-sacrifice and Hyperspeed Decoding Architecture for Generative Recommendations
Y. Wang
Shengyu Zhou
Jinyu Lu
Ziwei Liu
Langming Liu
...
Feng Li
Wenbo Su
Pengjie Wang
Jian Xu
Xiangyu Zhao
172
0
0
24 Nov 2025
Sphinx: Efficiently Serving Novel View Synthesis using Regression-Guided Selective Refinement
Yuchen Xia
Souvik Kundu
Mosharaf Chowdhury
Nishil Talati
DiffM
186
0
0
24 Nov 2025
WavefrontDiffusion: Dynamic Decoding Schedule for Improved Reasoning
Haojin Yang
Rui Hu
Zequn Sun
Rui Zhou
Yujun Cai
Yiwei Wang
DiffM
85
0
0
22 Nov 2025
Accelerating Time Series Foundation Models with Speculative Decoding
Pranav Subbaraman
Fang Sun
Yue Yao
Huacong Tang
Xiao Luo
Yizhou Sun
AI4TS
252
0
0
22 Nov 2025
Taming the Long-Tail: Efficient Reasoning RL Training with Adaptive Drafter
Qinghao Hu
S. Yang
Junxian Guo
Xiaozhe Yao
Yujun Lin
Yuxian Gu
Han Cai
Chuang Gan
Ana Klimovic
Song Han
OffRL
AI4TS
LRM
122
2
0
20 Nov 2025
Global Resolution: Optimal Multi-Draft Speculative Sampling via Convex Minimization
Rahul Thomas
Arka Pal
111
0
0
19 Nov 2025
Beat the long tail: Distribution-Aware Speculative Decoding for RL Training
Zelei Shao
Vikranth Srivatsa
S. Srivastava
Qingyang Wu
Alpay Ariyak
...
Ce Zhang
Yiying Zhang
Ben Athiwaratkun
Chenfeng Xu
Junxiong Wang
OffRL
190
2
0
17 Nov 2025
Global Cross-Time Attention Fusion for Enhanced Solar Flare Prediction from Multivariate Time Series
Onur Vural
S. M. Hamdi
S. F. Boubrahimi
AI4TS
135
0
0
17 Nov 2025
CSV-Decode: Certifiable Sub-Vocabulary Decoding for Efficient Large Language Model Inference
Dong Liu
Yanxuan Yu
Ben Lengerich
55
2
0
16 Nov 2025
Cacheback: Speculative Decoding With Nothing But Cache
Zhiyao Ma
In Gim
Lin Zhong
BDL
196
0
0
15 Nov 2025
Speculative Decoding in Decentralized LLM Inference: Turning Communication Latency into Computation Throughput
Jingwei Song
Wanyi Chen
Xinyuan Song
Chris Tong
Gufeng Chen
Tianyi Zhao
Eric Yang
Bill Shi
Lynn Ai
69
0
0
13 Nov 2025
Steering Pretrained Drafters during Speculative Decoding
Frédéric Berdoz
Peer Rheinboldt
Roger Wattenhofer
LLMSV
437
0
0
13 Nov 2025
Test-time Diverse Reasoning by Riemannian Activation Steering
Ly Tran Ho Khanh
Dongxuan Zhu
Man-Chung Yue
Viet Anh Nguyen
LLMSV
286
0
0
11 Nov 2025
ConvFill: Model Collaboration for Responsive Conversational Voice Agents
Vidya Srinivas
Zachary Englhardt
Maximus Powers
Shwetak N. Patel
Vikram Iyer
84
0
0
10 Nov 2025
Confidence-Guided Stepwise Model Routing for Cost-Efficient Reasoning
Sangmook Lee
Dohyung Kim
Hyukhun Koh
Nakyeong Yang
Kyomin Jung
LRM
147
0
0
09 Nov 2025
Verifying LLM Inference to Detect Model Weight Exfiltration
Roy Rinberg
Adam Karvonen
Alex Hoover
Daniel Reuter
Keri Warr
AAML
123
1
0
04 Nov 2025
Collaborative Large Language Model Inference via Resource-Aware Parallel Speculative Decoding
Jungyeon Koh
H. Yang
120
0
0
03 Nov 2025
Democratizing LLM Efficiency: From Hyperscale Optimizations to Universal Deployability
Hen-Hsen Huang
97
0
0
03 Nov 2025
TapOut: A Bandit-Based Approach to Dynamic Speculative Decoding
Aditya Sridhar
Nish Sinnadurai
Sean Lie
Vithursan Thangarasa
112
0
0
03 Nov 2025
When, What, and How: Rethinking Retrieval-Enhanced Speculative Decoding
Min Fang
Zhihui Fu
Qibin Zhao
Jun Wang
109
0
0
03 Nov 2025
FlexiCache: Leveraging Temporal Stability of Attention Heads for Efficient KV Cache Management
Nazmul Takbir
Hamidreza Alikhani
N. Dutt
Sangeetha Abdu Jyothi
74
0
0
02 Nov 2025
Reject Only Critical Tokens: Pivot-Aware Speculative Decoding
Amir Ziashahabi
Yavuz Faruk Bakman
D. Yaldiz
Mostafa El-Khamy
Sai Praneeth Karimireddy
Salman Avestimehr
109
1
0
01 Nov 2025
SpecDiff-2: Scaling Diffusion Drafter Alignment For Faster Speculative Decoding
Jameson Sandler
Jacob K Christopher
Thomas Hartvigsen
Ferdinando Fioretto
177
1
0
01 Nov 2025
SpecAttn: Speculating Sparse Attention
Harsh Shah
100
0
0
31 Oct 2025
Continuous Autoregressive Language Models
Chenze Shao
Darren Li
Fandong Meng
Jie Zhou
KELM
318
0
0
31 Oct 2025
ReSpec: Towards Optimizing Speculative Decoding in Reinforcement Learning Systems
Qiaoling Chen
Zijun Liu
Peng Sun
Shenggui Li
Guoteng Wang
Ziming Liu
Yonggang Wen
Siyuan Feng
Tianwei Zhang
104
3
0
30 Oct 2025
CAS-Spec: Cascade Adaptive Self-Speculative Decoding for On-the-Fly Lossless Inference Acceleration of LLMs
Zhiyuan Ning
Jiawei Shao
Ruge Xu
Xinfei Guo
Jun Zhang
Chi Zhang
Xuelong Li
123
0
0
30 Oct 2025
Kad: A Framework for Proxy-based Test-time Alignment with Knapsack Approximation Deferral
Ayoub Hammal
Pierre Zweigenbaum
Caio Corro
238
0
0
30 Oct 2025
Polybasic Speculative Decoding Through a Theoretical Perspective
Ruilin Wang
Huixia Li
Yuexiao Ma
Xiawu Zheng
Fei Chao
Xuefeng Xiao
Rongrong Ji
236
0
0
30 Oct 2025
The End of Manual Decoding: Towards Truly End-to-End Language Models
Z. Wang
Dongyang Ma
X. Y. Huang
Deng Cai
Tian Lan
J. Xu
Haitao Mi
Xiaoying Tang
Yan Wang
SyDa
OffRL
417
0
0
30 Oct 2025
Hawk: Leveraging Spatial Context for Faster Autoregressive Text-to-Image Generation
Zhi-Kai Chen
Jun-Peng Jiang
Han-Jia Ye
De-Chuan Zhan
137
1
0
29 Oct 2025
NetEcho: From Real-World Streaming Side-Channels to Full LLM Conversation Recovery
Z. Zhang
Guanlong Wu
Sen Deng
Shuai Wang
Y. Zhang
140
0
0
29 Oct 2025
SelecTKD: Selective Token-Weighted Knowledge Distillation for LLMs
Haiduo Huang
Jiangcheng Song
Yadong Zhang
Pengju Ren
137
0
0
28 Oct 2025
MC-SJD : Maximal Coupling Speculative Jacobi Decoding for Autoregressive Visual Generation Acceleration
Junhyuk So
Hyunho Kook
Chaeyeon Jang
Eunhyeok Park
132
0
0
28 Oct 2025
BitSkip: An Empirical Analysis of Quantization and Early Exit Composition
Ramshankar Bhuvaneswaran
Handan Liu
MQ
249
0
0
27 Oct 2025
Rethinking Inference Placement for Deep Learning across Edge and Cloud Platforms: A Multi-Objective Optimization Perspective and Future Directions
Zongshun Zhang
I. Matta
120
0
0
27 Oct 2025
FastVLM: Self-Speculative Decoding for Fast Vision-Language Model Inference
Divya J. Bajpai
M. Hanawal
MLLM
VLM
211
0
0
26 Oct 2025
Encoder-Decoder Diffusion Language Models for Efficient Training and Inference
Marianne Arriola
Yair Schiff
Hao Phung
Aaron Gokaslan
Volodymyr Kuleshov
143
1
0
26 Oct 2025
Chitchat with AI: Understand the supply chain carbon disclosure of companies worldwide through Large Language Model
Haotian Hang
Yueyang Shen
Vicky Zhu
Jose Cruz
Michelle Li
79
0
0
26 Oct 2025
1
2
3
4
...
14
15
16
Next