Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2302.01318
Cited By
Accelerating Large Language Model Decoding with Speculative Sampling
2 February 2023
Charlie Chen
Sebastian Borgeaud
G. Irving
Jean-Baptiste Lespiau
Laurent Sifre
J. Jumper
BDL
LRM
Re-assign community
ArXiv (abs)
PDF
HTML
HuggingFace (3 upvotes)
Github
Papers citing
"Accelerating Large Language Model Decoding with Speculative Sampling"
50 / 460 papers shown
Planned Diffusion
Daniel Israel
Tian Jin
Ellie Y. Cheng
Guy Van den Broeck
Aditya Grover
Suvinay Subramanian
Michael Carbin
DiffM
208
5
0
27 Mar 2026
Fast LLM Post-training via Decoupled and Fastest-of-N Speculation
Rongxin Cheng
Kai Zhou
Xingda Wei
Siyuan Liu
Mingcong Han
...
Yeju Zhou
Baoquan Zhong
W. L. Xiao
Rong Chen
Haibo Chen
OffRL
LRM
521
0
0
24 Dec 2025
Accelerating Large-Scale Reasoning Model Inference with Sparse Self-Speculative Decoding
Yilong Zhao
Jiaming Tang
Kan Zhu
Zihao Ye
Chi-chih Chang
...
Mohamed S. Abdelfattah
Mingyu Gao
Baris Kasikci
Song Han
Ion Stoica
ReLM
LRM
265
1
0
01 Dec 2025
Thinking with Drafts: Speculative Temporal Reasoning for Efficient Long Video Understanding
Pengfei Hu
Meng Cao
Y. Wang
Yi Wang
Jiahua Dong
Jun Song
Yu Cheng
Bo Zheng
Xiaodan Liang
LRM
VLM
190
1
0
30 Nov 2025
Training-Free Loosely Speculative Decoding: Accepting Semantically Correct Drafts Beyond Exact Match
Jinze Li
Yixing Xu
Guanchen Li
Shuo Yang
Jinfeng Xu
Xuanwu Yin
Dong Li
Edith C.H.Ngai
E. Barsoum
LRM
131
2
0
28 Nov 2025
DSD: A Distributed Speculative Decoding Solution for Edge-Cloud Agile Large Model Serving
Fengze Yu
Leshu Li
Brad McDanel
Sai Qian Zhang
327
2
0
26 Nov 2025
DiFR: Inference Verification Despite Nondeterminism
Adam Karvonen
Daniel Reuter
Roy Rinberg
Luke Marks
Adrià Garriga-Alonso
Keri Warr
145
1
0
25 Nov 2025
Orchestrating Dual-Boundaries: An Arithmetic Intensity Inspired Acceleration Framework for Diffusion Language Models
Linye Wei
Wenjue Chen
Pingzhi Tang
Xiaotian Guo
Le Ye
Runsheng Wang
Meng Li
AI4CE
148
3
0
24 Nov 2025
Global Resolution: Optimal Multi-Draft Speculative Sampling via Convex Minimization
Rahul Thomas
Arka Pal
156
1
0
19 Nov 2025
FlashMesh: Faster and Better Autoregressive Mesh Synthesis via Structured Speculation
Tingrui Shen
Yiheng Zhang
Chen Tang
Chuan Ping
Zixing Zhao
Le Wan
Y Samuel Wang
Ronggang Wang
Shengfeng He
AI4CE
436
0
0
19 Nov 2025
Steering Pretrained Drafters during Speculative Decoding
Frédéric Berdoz
Peer Rheinboldt
Roger Wattenhofer
LLMSV
504
0
0
13 Nov 2025
Verifying LLM Inference to Detect Model Weight Exfiltration
Roy Rinberg
Adam Karvonen
Alex Hoover
Daniel Reuter
Keri Warr
177
1
0
04 Nov 2025
When, What, and How: Rethinking Retrieval-Enhanced Speculative Decoding
Min Fang
Zhihui Fu
Qibin Zhao
Jun Wang
153
0
0
03 Nov 2025
Democratizing LLM Efficiency: From Hyperscale Optimizations to Universal Deployability
Hen-Hsen Huang
122
0
0
03 Nov 2025
Collaborative Large Language Model Inference via Resource-Aware Parallel Speculative Decoding
Jungyeon Koh
H. Yang
171
0
0
03 Nov 2025
TapOut: A Bandit-Based Approach to Dynamic Speculative Decoding
Aditya Sridhar
Nish Sinnadurai
Sean Lie
Vithursan Thangarasa
148
0
0
03 Nov 2025
Reject Only Critical Tokens: Pivot-Aware Speculative Decoding
Amir Ziashahabi
Yavuz Faruk Bakman
D. Yaldiz
Mostafa El-Khamy
Sai Praneeth Karimireddy
Salman Avestimehr
137
1
0
01 Nov 2025
SpecDiff-2: Scaling Diffusion Drafter Alignment For Faster Speculative Decoding
Jameson Sandler
Jacob K Christopher
Thomas Hartvigsen
Ferdinando Fioretto
256
5
0
01 Nov 2025
SpecAttn: Speculating Sparse Attention
Harsh Shah
163
0
0
31 Oct 2025
Polybasic Speculative Decoding Through a Theoretical Perspective
Ruilin Wang
Huixia Li
Yuexiao Ma
Xiawu Zheng
Fei Chao
Xuefeng Xiao
Rongrong Ji
277
0
0
30 Oct 2025
CAS-Spec: Cascade Adaptive Self-Speculative Decoding for On-the-Fly Lossless Inference Acceleration of LLMs
Zhiyuan Ning
Jiawei Shao
Ruge Xu
Xinfei Guo
Jun Zhang
Chi Zhang
Xuelong Li
172
0
0
30 Oct 2025
ReSpec: Towards Optimizing Speculative Decoding in Reinforcement Learning Systems
Qiaoling Chen
Zijun Liu
Peng Sun
Shenggui Li
Guoteng Wang
Ziming Liu
Yonggang Wen
Siyuan Feng
Tianwei Zhang
125
4
0
30 Oct 2025
The End of Manual Decoding: Towards Truly End-to-End Language Models
Z. Wang
Dongyang Ma
X. Y. Huang
Deng Cai
Tian Lan
J. Xu
Haitao Mi
Xiaoying Tang
Yan Wang
SyDa
OffRL
476
4
0
30 Oct 2025
Kad: A Framework for Proxy-based Test-time Alignment with Knapsack Approximation Deferral
Ayoub Hammal
Pierre Zweigenbaum
Caio Corro
278
0
0
30 Oct 2025
Hawk: Leveraging Spatial Context for Faster Autoregressive Text-to-Image Generation
Zhi-Kai Chen
Jun-Peng Jiang
Han-Jia Ye
De-Chuan Zhan
163
1
0
29 Oct 2025
SelecTKD: Selective Token-Weighted Knowledge Distillation for LLMs
Haiduo Huang
Jiangcheng Song
Yadong Zhang
Pengju Ren
195
0
0
28 Oct 2025
MC-SJD : Maximal Coupling Speculative Jacobi Decoding for Autoregressive Visual Generation Acceleration
Junhyuk So
Hyunho Kook
Chaeyeon Jang
Eunhyeok Park
164
1
0
28 Oct 2025
Rethinking Inference Placement for Deep Learning across Edge and Cloud Platforms: A Multi-Objective Optimization Perspective and Future Directions
Zongshun Zhang
I. Matta
167
1
0
27 Oct 2025
Encoder-Decoder Diffusion Language Models for Efficient Training and Inference
Marianne Arriola
Yair Schiff
Hao Phung
Aaron Gokaslan
Volodymyr Kuleshov
196
10
0
26 Oct 2025
Batch Speculative Decoding Done Right
Ranran Haoran Zhang
Soumik Dey
Ashirbad Mishra
Hansi Wu
Binbin Li
Rui Zhang
186
0
0
26 Oct 2025
FastVLM: Self-Speculative Decoding for Fast Vision-Language Model Inference
Divya J. Bajpai
M. Hanawal
MLLM
VLM
254
0
0
26 Oct 2025
AdaSPEC: Selective Knowledge Distillation for Efficient Speculative Decoders
Yuezhou Hu
Jiaxin Guo
Xinyu Feng
Tuo Zhao
136
5
0
22 Oct 2025
Not-a-Bandit: Provably No-Regret Drafter Selection in Speculative Decoding for LLMs
Hongyi Liu
Jiaji Huang
Zhen Jia
Youngsuk Park
Yu Wang
OffRL
178
2
0
22 Oct 2025
No Compute Left Behind: Rethinking Reasoning and Sampling with Masked Diffusion Models
Zachary Horvitz
Raghav Singhal
Hao Zou
Carles Domingo-Enrich
Zhou Yu
Rajesh Ranganath
Kathleen McKeown
LRM
195
3
0
22 Oct 2025
Fast Inference via Hierarchical Speculative Decoding
Clara Mohri
Haim Kaplan
Tal Schuster
Yishay Mansour
Amir Globerson
219
0
0
22 Oct 2025
Test-time Verification via Optimal Transport: Coverage, ROC, & Sub-optimality
Arpan Mukherjee
Marcello Bullo
Debabrota Basu
Deniz Gündüz
164
1
0
21 Oct 2025
EdgeReasoning: Characterizing Reasoning LLM Deployment on Edge GPUs
Benjamin Kubwimana
Qijing Huang
LRM
142
2
0
21 Oct 2025
Reasoning Language Model Inference Serving Unveiled: An Empirical Study
Qi Li
Junpan Wu
Xiang Liu
Yuxin Wang
Z. Li
Zhenheng Tang
Yuhan Chen
Shaohuai Shi
Xiaowen Chu
ReLM
LRM
325
1
0
21 Oct 2025
What Limits Agentic Systems Efficiency?
S. Bian
Minghao Yan
Anand Jayarajan
Gennady Pekhimenko
Shivaram Venkataraman
LLMAG
LRM
220
1
0
18 Oct 2025
When to Ensemble: Identifying Token-Level Points for Stable and Fast LLM Ensembling
Heecheol Yun
Kwangmin Ki
J. H. Lee
Eunho Yang
193
0
0
17 Oct 2025
Synera: Synergistic LLM Serving across Device and Cloud at Scale
Genglin Wang
Liekang Zeng
Bufang Yang
Kaiwei Liu
Guoliang Xing
Chumin Sun
Li Zhou
Jie Sun
Zhenyu Yan
151
0
0
17 Oct 2025
TokenTiming: A Dynamic Alignment Method for Universal Speculative Decoding Model Pairs
Sibo Xiao
Jinyuan Fu
Zhongle Xie
Lidan Shou
AI4TS
231
0
0
17 Oct 2025
Mirror Speculative Decoding: Breaking the Serial Barrier in LLM Inference
Nikhil Bhendawade
K. Nishu
Arnav Kundu
Chris Bartels
Minsik Cho
Irina Belousova
LRM
423
1
0
15 Oct 2025
Breadcrumbs Reasoning: Memory-Efficient Reasoning with Compression Beacons
Giovanni Monea
Yair Feldman
Shankar Padmanabhan
Kianté Brantley
Yoav Artzi
237
1
0
15 Oct 2025
3-Model Speculative Decoding
Sanghyun Byun
Mohanad Odema
Jung Guack
Baisub Lee
Jacob Song
Woo Seong Chung
LRM
141
0
0
14 Oct 2025
A Survey on Parallel Reasoning
Z. Wang
Boye Niu
Zipeng Gao
Zhi Zheng
Tong Xu
...
Yilong Chen
Chen Zhu
Hua Wu
Haifeng Wang
Enhong Chen
ReLM
LRM
222
5
0
14 Oct 2025
A Survey on Collaborating Small and Large Language Models for Performance, Cost-effectiveness, Cloud-edge Privacy, and Trustworthiness
Fali Wang
Jihai Chen
Shuhua Yang
Ali Al-Lawati
Linli Tang
Hui Liu
Suhang Wang
237
4
0
14 Oct 2025
DynaSpec: Context-aware Dynamic Speculative Sampling for Large-Vocabulary Language Models
Jinbin Zhang
Nasib Ullah
Erik Schultheis
Rohit Babbar
190
1
0
11 Oct 2025
Placeit! A Framework for Learning Robot Object Placement Skills
Amina Ferrad
J. Huber
Francois Helenon
Julien Gleyze
Mahdi Khoramshahi
Stéphane Doncieux
172
2
0
10 Oct 2025
Towards Better & Faster Autoregressive Image Generation: From the Perspective of Entropy
Xiaoxiao Ma
Feng Zhao
Pengyang Ling
Haibo Qiu
Zhixiang Wei
Hu Yu
Jie Huang
Zhixiong Zeng
Lin Ma
228
5
0
10 Oct 2025
1
2
3
4
...
8
9
10
Next
Page 1 of 10