Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2211.17192
Cited By
v1
v2 (latest)
Fast Inference from Transformers via Speculative Decoding
International Conference on Machine Learning (ICML), 2022
30 November 2022
Yaniv Leviathan
Matan Kalman
Yossi Matias
LRM
Re-assign community
ArXiv (abs)
PDF
HTML
HuggingFace (9 upvotes)
Papers citing
"Fast Inference from Transformers via Speculative Decoding"
50 / 763 papers shown
SpecExit: Accelerating Large Reasoning Model via Speculative Exit
Rubing Yang
Huajun Bai
Song Liu
Guanghua Yu
Runzhi Fan
Yanbin Dang
Jiejing Zhang
Kai Liu
Jianchen Zhu
Peng Chen
ReLM
LRM
115
0
0
29 Sep 2025
Learning to Parallel: Accelerating Diffusion Large Language Models via Learnable Parallel Decoding
Wenrui Bao
Zhiben Chen
Dan Xu
Yuzhang Shang
197
0
0
29 Sep 2025
Learning to Ponder: Adaptive Reasoning in Latent Space
Yixin He
Lumingyuan Tang
LRM
112
2
0
29 Sep 2025
Intra-request branch orchestration for efficient LLM reasoning
Weifan Jiang
Rana Shahout
Yilun Du
Michael Mitzenmacher
Minlan Yu
LRM
116
0
0
29 Sep 2025
Speculative Verification: Exploiting Information Gain to Refine Speculative Decoding
Sungkyun Kim
Jaemin Kim
Dogyung Yoon
Jiho Shin
Junyeol Lee
Jiwon Seo
139
0
0
29 Sep 2025
HiViS: Hiding Visual Tokens from the Drafter for Speculative Decoding in Vision-Language Models
Zhinan Xie
Peisong Wang
Jian Cheng
Jian Cheng
VLM
157
0
0
28 Sep 2025
DiffuSpec: Unlocking Diffusion Language Models for Speculative Decoding
Guanghao Li
Zhihui Fu
Min Fang
Qibin Zhao
Ming Tang
Chun Yuan
Jun Wang
126
4
0
28 Sep 2025
SPEC-RL: Accelerating On-Policy Reinforcement Learning with Speculative Rollouts
Bingshuai Liu
Ante Wang
Zijun Min
Liang Yao
Haibo Zhang
Yang Liu
Anxiang Zeng
Jinsong Su
Anxiang Zeng
Jinsong Su
OffRL
LRM
200
5
0
27 Sep 2025
Scaling LLM Test-Time Compute with Mobile NPU on Smartphones
Zixu Hao
Jianyu Wei
Tuowei Wang
Minxing Huang
Huiqiang Jiang
Shiqi Jiang
Ting Cao
Ju Ren
269
1
0
27 Sep 2025
Not only a helper, but also a teacher: Interactive LLM Cascade
Yu Wu
Shuo Wu
Ye Tao
Yansong Li
Anand Sarwate
RALM
168
0
0
26 Sep 2025
SelfJudge: Faster Speculative Decoding via Self-Supervised Judge Verification
Kanghoon Yoon
Minsub Kim
Sungjae Lee
Joonhyung Lee
Sunghyeon Woo
Yeonjun In
S. Kwon
Chanyoung Park
Dongsoo Lee
120
1
0
26 Sep 2025
Bridging Draft Policy Misalignment: Group Tree Optimization for Speculative Decoding
Shijing Hu
Jingyang Li
Zhihui Lu
Pan Zhou
142
0
0
26 Sep 2025
Reinforcement Learning-Guided Chain-of-Draft for Token-Efficient Code Generation
Xunzhu Tang
Iyiola Emmanuel Olatunji
Tiezhu Sun
Jacques Klein
Tegawende F. Bissyande
LRM
85
1
0
26 Sep 2025
FastGRPO: Accelerating Policy Optimization via Concurrency-aware Speculative Decoding and Online Draft Learning
Yizhou Zhang
Ning Lv
T. Wang
Jisheng Dang
OffRL
LRM
128
1
0
26 Sep 2025
Infusing Theory of Mind into Socially Intelligent LLM Agents
EunJeong Hwang
Yuwei Yin
Giuseppe Carenini
Peter West
Vered Shwartz
LLMAG
1.6K
1
0
26 Sep 2025
SpecMER: Fast Protein Generation with K-mer Guided Speculative Decoding
Thomas Walton
Darin Tsui
Aryan Musharaf
Amirali Aghazadeh
105
0
0
25 Sep 2025
Enabling Approximate Joint Sampling in Diffusion LMs
Parikshit Bansal
Sujay Sanghavi
DiffM
111
1
0
25 Sep 2025
Painless Activation Steering: An Automated, Lightweight Approach for Post-Training Large Language Models
Sasha Cui
Zhongren Chen
LLMSV
238
1
0
25 Sep 2025
FastEagle: Cascaded Drafting for Accelerating Speculative Decoding
Haiduo Huang
Jiangcheng Song
Wenzhe zhao
Pengju Ren
111
0
0
24 Sep 2025
APRIL: Active Partial Rollouts in Reinforcement Learning to Tame Long-tail Generation
Y. Zhou
Jiajun Li
Yusheng Su
Gowtham Ramesh
Zilin Zhu
...
Jiang Liu
Qiaolin Yu
Hao Chen
Zicheng Liu
Emad Barsoum
OffRL
364
4
0
23 Sep 2025
Hyper-Bagel: A Unified Acceleration Framework for Multimodal Understanding and Generation
Yanzuo Lu
Xin Xia
Manlin Zhang
Huafeng Kuang
Jianbin Zheng
Yuxi Ren
Xuefeng Xiao
191
6
0
23 Sep 2025
Spiffy: Multiplying Diffusion LLM Acceleration via Lossless Speculative Decoding
Sudhanshu Agrawal
Risheek Garrepalli
Raghavv Goel
Mingu Lee
Christopher Lott
Fatih Porikli
222
6
0
22 Sep 2025
Speculate Deep and Accurate: Lossless and Training-Free Acceleration for Offloaded LLMs via Substitute Speculative Decoding
Pei-Shuo Wang
Jian-Jia Chen
Chun-Che Yang
Chi-chih Chang
N. Huang
Mohamed S. Abdelfattah
Kai-Chiang Wu
MQ
215
0
0
22 Sep 2025
Randomized Smoothing Meets Vision-Language Models
Emmanouil Seferis
Changshun Wu
Stefanos D. Kollias
Saddek Bensalem
Chih-Hong Cheng
AAML
123
0
0
19 Sep 2025
Pipeline Parallelism is All You Need for Optimized Early-Exit Based Self-Speculative Decoding
Ruanjun Li
Ziheng Liu
Yuanming Shi
Jiawei Shao
Chi Zhang
Xuelong Li
154
0
0
19 Sep 2025
Direct Simultaneous Translation Activation for Large Audio-Language Models
Pei Zhang
Yiming Wang
Jialong Tang
Baosong Yang
Rui Wang
Yang Li
Fei Huang
102
0
0
19 Sep 2025
ATTS: Asynchronous Test-Time Scaling via Conformal Prediction
Jing Xiong
Qiujiang Chen
Fanghua Ye
Zhongwei Wan
Chuanyang Zheng
...
Haochen Tan
Haoli Bai
Lifeng Shang
Lingpeng Kong
Ngai Wong
LRM
207
0
0
18 Sep 2025
ViSpec: Accelerating Vision-Language Models with Vision-Aware Speculative Decoding
Jialiang Kang
Han Shu
Wenshuo Li
Yingjie Zhai
Xinghao Chen
MLLM
VLM
397
1
0
17 Sep 2025
FastMTP: Accelerating LLM Inference with Enhanced Multi-Token Prediction
Yuxuan Cai
Xiaozhuan Liang
X. Wang
Jin Ma
Haijin Liang
Jinwen Luo
Xinyu Zuo
Lisheng Duan
Yuyang Yin
Xi Chen
166
1
0
16 Sep 2025
LATTS: Locally Adaptive Test-Time Scaling
Theo Uscidda
Matthew Trager
Michael Kleinman
Aditya Chattopadhyay
Wei Xia
Stefano Soatto
LRM
183
2
0
16 Sep 2025
Match Chat: Real Time Generative AI and Generative Computing for Tennis
Aaron Baughman
Gozde Akay
Eduardo Morales
Rahul Agarwal
Preetika Srivastava
96
0
0
16 Sep 2025
Decoding in Latent Spaces for Efficient Inference in LLM-based Recommendation
Chengbing Wang
Yang Zhang
Zhicheng Wang
Tianhao Shi
Keqin Bao
Fuli Feng
Tat-Seng Chua
111
0
0
15 Sep 2025
AvatarSync: Rethinking Talking-Head Animation through Phoneme-Guided Autoregressive Perspective
Yuchen Deng
Xiuyang Wu
Hai-Tao Zheng
Suiyang Zhang
Yi He
Yuxing Han
VGen
123
0
0
15 Sep 2025
SpeCa: Accelerating Diffusion Transformers with Speculative Feature Caching
Jiacheng Liu
Chang Zou
Yuanhuiyi Lyu
Fei Ren
Shaobo Wang
Kaixin Li
Linfeng Zhang
DiffM
214
5
0
15 Sep 2025
Spec-LLaVA: Accelerating Vision-Language Models with Dynamic Tree-Based Speculative Decoding
Mingxiao Huo
Jiayi Zhang
Hewei Wang
Jinfeng Xu
Zheyu Chen
Huilin Tai
Yijun Chen
MLLM
VLM
148
1
0
15 Sep 2025
SpecVLM: Fast Speculative Decoding in Vision-Language Models
Haiduo Huang
Fuwei Yang
Zhenhua Liu
Xuanwu Yin
Dong Li
Pengju Ren
E. Barsoum
MLLM
VLM
194
0
0
15 Sep 2025
Uncovering Scaling Laws for Large Language Models via Inverse Problems
Arun Verma
Zhaoxuan Wu
Zijian Zhou
Xiaoqiang Lin
Zhiliang Chen
...
Zitong Zhao
Xinyi Xu
Apivich Hemachandra
See-Kiong Ng
Bryan Kian Hsiang Low
LRM
143
0
0
09 Sep 2025
K2-Think: A Parameter-Efficient Reasoning System
Zhoujun Cheng
Richard Fan
Shibo Hao
Taylor W. Killian
Haonan Li
...
Xuezhe Ma
Guowei He
Zhiting Hu
Zhengzhong Liu
Eric P. Xing
ReLM
OffRL
ALM
LRM
300
4
0
09 Sep 2025
SpecPrune-VLA: Accelerating Vision-Language-Action Models via Action-Aware Self-Speculative Pruning
Hanzhen Wang
Jiaming Xu
Jiayi Pan
Yongkang Zhou
Guohao Dai
VLM
123
9
0
06 Sep 2025
Less is More Tokens: Efficient Math Reasoning via Difficulty-Aware Chain-of-Thought Distillation
Abdul Waheed
Chancharik Mitra
Laurie Z. Wang
Deva Ramanan
Bhiksha Raj
LRM
129
0
0
05 Sep 2025
Recurrent State Encoders for Efficient Neural Combinatorial Optimization
Tim Dernedde
Daniela Thyssens
Lars Schmidt-Thieme
136
0
0
05 Sep 2025
Set Block Decoding is a Language Model Inference Accelerator
Itai Gat
Heli Ben-Hamu
Marton Havasi
Daniel Haziza
Jeremy Reizenstein
Gabriel Synnaeve
David Lopez-Paz
Brian Karrer
Y. Lipman
149
6
0
04 Sep 2025
Dynamic Speculative Agent Planning
Yilin Guan
Qingfeng Lan
Qingfeng Lan
Sun Fei
Dujian Ding
Devang Acharya
Chi Wang
William Yang Wang
OffRL
269
1
0
02 Sep 2025
DSDE: Dynamic Speculative Decoding with KLD Stability for Real-World Serving
Mingyu Yang
Jae-Young Choi
Kihyo Moon
Minsung Jang
Eunjoo Jeon
197
0
0
01 Sep 2025
Efficient Large Language Models with Zero-Shot Adjustable Acceleration
Sajjad Kachuee
M. Sharifkhani
165
0
0
01 Sep 2025
LongCat-Flash Technical Report
M-A-P Team
Bayan
Bei Li
Bingye Lei
Bo Wang
...
Rongxiang Weng
Ruichen Shao
Rumei Li
Shizhe Wu
Shuai Liang
MLLM
MoE
VLM
401
15
0
01 Sep 2025
Scaling Up, Speeding Up: A Benchmark of Speculative Decoding for Efficient LLM Test-Time Scaling
Shengyin Sun
Yiming Li
Xing Li
Yingzhao Lian
Weizhe Lin
...
Zhiyuan Yang
Chen Chen
Xianzhi Yu
Mingxuan Yuan
Chen Ma
LRM
101
4
0
30 Aug 2025
A Cost-Benefit Analysis of On-Premise Large Language Model Deployment: Breaking Even with Commercial LLM Services
International Symposium on Mixed and Augmented Reality (ISMAR), 2025
Guanzhong Pan
Vishal Chodnekar
Abinas Roy
Haibo Wang
ELM
274
3
0
30 Aug 2025
ParaThinker: Native Parallel Thinking as a New Paradigm to Scale LLM Test-time Compute
Hao Wen
Yifan Su
Feifei Zhang
Yunxin Liu
Yunhao Liu
Y. Zhang
Yuanchun Li
ReLM
LRM
176
14
0
30 Aug 2025
AI Compute Architecture and Evolution Trends
Bor-Sung Liang
171
1
0
29 Aug 2025
Previous
1
2
3
4
5
6
...
14
15
16
Next