ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2211.17192
  4. Cited By
Fast Inference from Transformers via Speculative Decoding
v1v2 (latest)

Fast Inference from Transformers via Speculative Decoding

International Conference on Machine Learning (ICML), 2022
30 November 2022
Yaniv Leviathan
Matan Kalman
Yossi Matias
    LRM
ArXiv (abs)PDFHTMLHuggingFace (9 upvotes)

Papers citing "Fast Inference from Transformers via Speculative Decoding"

50 / 763 papers shown
FastVLM: Self-Speculative Decoding for Fast Vision-Language Model Inference
FastVLM: Self-Speculative Decoding for Fast Vision-Language Model Inference
Divya J. Bajpai
M. Hanawal
MLLMVLM
211
0
0
26 Oct 2025
Memory-based Language Models: An Efficient, Explainable, and Eco-friendly Approach to Large Language Modeling
Memory-based Language Models: An Efficient, Explainable, and Eco-friendly Approach to Large Language Modeling
Antal van den Bosch
Ainhoa Risco Patón
Teun Buijse
Peter Berck
Maarten van Gompel
73
0
0
25 Oct 2025
Parallel Sampling from Masked Diffusion Models via Conditional Independence Testing
Parallel Sampling from Masked Diffusion Models via Conditional Independence Testing
Iskander Azangulov
Teodora Pandeva
Niranjani Prasad
Javier Zazo
Sushrut Karmalkar
DiffM
96
1
0
24 Oct 2025
Small Drafts, Big Verdict: Information-Intensive Visual Reasoning via Speculation
Small Drafts, Big Verdict: Information-Intensive Visual Reasoning via Speculation
Y. Liu
Lianhui Qin
Shengjie Wang
LRM
200
1
0
23 Oct 2025
Fast Inference via Hierarchical Speculative Decoding
Fast Inference via Hierarchical Speculative Decoding
Clara Mohri
Haim Kaplan
Tal Schuster
Yishay Mansour
Amir Globerson
190
0
0
22 Oct 2025
No Compute Left Behind: Rethinking Reasoning and Sampling with Masked Diffusion Models
No Compute Left Behind: Rethinking Reasoning and Sampling with Masked Diffusion Models
Zachary Horvitz
Raghav Singhal
Hao Zou
Carles Domingo-Enrich
Zhou Yu
Rajesh Ranganath
Kathleen McKeown
LRM
149
1
0
22 Oct 2025
AdaSPEC: Selective Knowledge Distillation for Efficient Speculative Decoders
AdaSPEC: Selective Knowledge Distillation for Efficient Speculative Decoders
Yuezhou Hu
Jiaxin Guo
Xinyu Feng
Tuo Zhao
100
2
0
22 Oct 2025
Not-a-Bandit: Provably No-Regret Drafter Selection in Speculative Decoding for LLMs
Not-a-Bandit: Provably No-Regret Drafter Selection in Speculative Decoding for LLMs
Hongyi Liu
Jiaji Huang
Zhen Jia
Youngsuk Park
Yu Wang
OffRL
138
2
0
22 Oct 2025
EdgeReasoning: Characterizing Reasoning LLM Deployment on Edge GPUs
EdgeReasoning: Characterizing Reasoning LLM Deployment on Edge GPUs
Benjamin Kubwimana
Qijing Huang
LRM
113
1
0
21 Oct 2025
Contrastive Decoding Mitigates Score Range Bias in LLM-as-a-Judge
Contrastive Decoding Mitigates Score Range Bias in LLM-as-a-Judge
Yoshinari Fujinuma
ELM
109
0
0
21 Oct 2025
Planned Diffusion
Planned Diffusion
Daniel Israel
Tian Jin
Ellie Y. Cheng
Guy Van den Broeck
Aditya Grover
Suvinay Subramanian
Michael Carbin
DiffM
132
2
0
20 Oct 2025
Efficient Vision-Language-Action Models for Embodied Manipulation: A Systematic Survey
Efficient Vision-Language-Action Models for Embodied Manipulation: A Systematic Survey
Weifan Guan
Qinghao Hu
Aosheng Li
Jian Cheng
LM&Ro
365
8
0
20 Oct 2025
What Limits Agentic Systems Efficiency?
What Limits Agentic Systems Efficiency?
S. Bian
Minghao Yan
Anand Jayarajan
Gennady Pekhimenko
Shivaram Venkataraman
LLMAGLRM
143
1
0
18 Oct 2025
TokenTiming: A Dynamic Alignment Method for Universal Speculative Decoding Model Pairs
TokenTiming: A Dynamic Alignment Method for Universal Speculative Decoding Model Pairs
Sibo Xiao
Jinyuan Fu
Zhongle Xie
Lidan Shou
AI4TS
165
0
0
17 Oct 2025
Efficient Parallel Samplers for Recurrent-Depth Models and Their Connection to Diffusion Language Models
Efficient Parallel Samplers for Recurrent-Depth Models and Their Connection to Diffusion Language Models
Jonas Geiping
Xinyu Yang
Guinan Su
121
0
0
16 Oct 2025
Kelle: Co-design KV Caching and eDRAM for Efficient LLM Serving in Edge Computing
Kelle: Co-design KV Caching and eDRAM for Efficient LLM Serving in Edge Computing
Tianhua Xia
Sai Qian Zhang
92
1
0
16 Oct 2025
Mirror Speculative Decoding: Breaking the Serial Barrier in LLM Inference
Mirror Speculative Decoding: Breaking the Serial Barrier in LLM Inference
Nikhil Bhendawade
K. Nishu
Arnav Kundu
Chris Bartels
Minsik Cho
Irina Belousova
LRM
328
0
0
15 Oct 2025
On the Reasoning Abilities of Masked Diffusion Language Models
On the Reasoning Abilities of Masked Diffusion Language Models
Anej Svete
Ashish Sabharwal
DiffMLRM
111
0
0
15 Oct 2025
Breadcrumbs Reasoning: Memory-Efficient Reasoning with Compression Beacons
Breadcrumbs Reasoning: Memory-Efficient Reasoning with Compression Beacons
Giovanni Monea
Yair Feldman
Shankar Padmanabhan
Kianté Brantley
Yoav Artzi
193
1
0
15 Oct 2025
A Survey on Parallel Reasoning
A Survey on Parallel Reasoning
Z. Wang
Boye Niu
Zipeng Gao
Zhi Zheng
Tong Xu
...
Yilong Chen
Chen Zhu
Hua Wu
Haifeng Wang
Enhong Chen
ReLMLRM
181
2
0
14 Oct 2025
A Survey on Collaborating Small and Large Language Models for Performance, Cost-effectiveness, Cloud-edge Privacy, and Trustworthiness
A Survey on Collaborating Small and Large Language Models for Performance, Cost-effectiveness, Cloud-edge Privacy, and Trustworthiness
Fali Wang
Jihai Chen
Shuhua Yang
Ali Al-Lawati
Linli Tang
Hui Liu
Suhang Wang
185
2
0
14 Oct 2025
3-Model Speculative Decoding
3-Model Speculative Decoding
Sanghyun Byun
Mohanad Odema
Jung Guack
Baisub Lee
Jacob Song
Woo Seong Chung
LRM
93
0
0
14 Oct 2025
DND: Boosting Large Language Models with Dynamic Nested Depth
DND: Boosting Large Language Models with Dynamic Nested Depth
Tieyuan Chen
Xiaodong Chen
Haoxing Chen
Zhenzhong Lan
W. Lin
Jianguo Li
MoE
234
0
0
13 Oct 2025
Direct Multi-Token Decoding
Direct Multi-Token Decoding
Xuan Luo
Weizhi Wang
Xifeng Yan
OffRL
103
0
0
13 Oct 2025
DynaSpec: Context-aware Dynamic Speculative Sampling for Large-Vocabulary Language Models
DynaSpec: Context-aware Dynamic Speculative Sampling for Large-Vocabulary Language Models
Jinbin Zhang
Nasib Ullah
Erik Schultheis
Rohit Babbar
133
1
0
11 Oct 2025
Conformal Sparsification for Bandwidth-Efficient Edge-Cloud Speculative Decoding
Conformal Sparsification for Bandwidth-Efficient Edge-Cloud Speculative Decoding
Payel Bhattacharjee
Fengwei Tian
Meiyu Zhong
Guangyi Zhang
Osvaldo Simeone
Ravi Tandon
126
0
0
11 Oct 2025
Towards Better & Faster Autoregressive Image Generation: From the Perspective of Entropy
Towards Better & Faster Autoregressive Image Generation: From the Perspective of Entropy
Xiaoxiao Ma
Feng Zhao
Pengyang Ling
Haibo Qiu
Zhixiang Wei
Hu Yu
Jie Huang
Zhixiong Zeng
Lin Ma
176
2
0
10 Oct 2025
ProxRouter: Proximity-Weighted LLM Query Routing for Improved Robustness to Outliers
ProxRouter: Proximity-Weighted LLM Query Routing for Improved Robustness to Outliers
Shivam Patel
Neharika Jali
Ankur Mallick
Gauri Joshi
132
1
0
10 Oct 2025
Logit Arithmetic Elicits Long Reasoning Capabilities Without Training
Logit Arithmetic Elicits Long Reasoning Capabilities Without Training
Y. Zhang
Muhammad Khalifa
Lechen Zhang
Xin Liu
Ayoung Lee
Xinliang Frederick Zhang
Farima Fatahi Bayat
L. Wang
RALMLRM
103
4
0
10 Oct 2025
Speculative Jacobi-Denoising Decoding for Accelerating Autoregressive Text-to-image Generation
Speculative Jacobi-Denoising Decoding for Accelerating Autoregressive Text-to-image Generation
Yao Teng
Fuyun Wang
Xian Liu
Z. Chen
Han Shi
Yu Wang
Zhenguo Li
Weiyang Liu
Difan Zou
Xihui Liu
DiffM
132
0
0
10 Oct 2025
Placeit! A Framework for Learning Robot Object Placement Skills
Placeit! A Framework for Learning Robot Object Placement Skills
Amina Ferrad
J. Huber
Francois Helenon
Julien Gleyze
Mahdi Khoramshahi
Stéphane Doncieux
120
1
0
10 Oct 2025
Scaling Laws for Code: A More Data-Hungry Regime
Scaling Laws for Code: A More Data-Hungry Regime
Xianzhen Luo
Wenzhen Zheng
Qingfu Zhu
Rongyi Zhang
Houyi Li
Siming Huang
YuanTao Fan
Wanxiang Che
ALM
111
2
0
09 Oct 2025
AdaSwitch: Adaptive Switching Generation for Knowledge Distillation
AdaSwitch: Adaptive Switching Generation for Knowledge Distillation
Jingyu Peng
Xinjian Zhao
Hengyi Cai
Yuchen Li
Kai Zhang
Shuaiqiang Wang
D. Yin
Xiangyu Zhao
99
1
0
09 Oct 2025
Lossless Vocabulary Reduction for Auto-Regressive Language Models
Lossless Vocabulary Reduction for Auto-Regressive Language Models
Daiki Chijiwa
Taku Hasegawa
Kyosuke Nishida
Shinýa Yamaguchi
Tomoya Ohba
Tamao Sakao
Susumu Takeuchi
104
1
0
09 Oct 2025
Beyond independent component analysis: identifiability and algorithms
Beyond independent component analysis: identifiability and algorithms
Alvaro Ribot
Anna Seigal
Piotr Zwiernik
CML
75
0
0
08 Oct 2025
lm-Meter: Unveiling Runtime Inference Latency for On-Device Language Models
lm-Meter: Unveiling Runtime Inference Latency for On-Device Language Models
Haoxin Wang
Xiaolong Tu
Hongyu Ke
Huirong Chai
Dawei Chen
Kyungtae Han
114
1
0
07 Oct 2025
Staircase Streaming for Low-Latency Multi-Agent Inference
Staircase Streaming for Low-Latency Multi-Agent Inference
Junlin Wang
Jue Wang
Zhen
Ben Athiwaratkun
Bhuwan Dhingra
Ce Zhang
James Y. Zou
182
0
0
06 Oct 2025
SwiReasoning: Switch-Thinking in Latent and Explicit for Pareto-Superior Reasoning LLMs
SwiReasoning: Switch-Thinking in Latent and Explicit for Pareto-Superior Reasoning LLMs
Dachuan Shi
Abedelkadir Asi
Keying Li
Xiangchi Yuan
Leyan Pan
Wenke Lee
Wen Xiao
LRM
146
0
0
06 Oct 2025
Draft, Verify, and Improve: Toward Training-Aware Speculative Decoding
Draft, Verify, and Improve: Toward Training-Aware Speculative Decoding
Shrenik Bhansali
Larry Heck
OffRL
68
0
0
06 Oct 2025
Drax: Speech Recognition with Discrete Flow Matching
Drax: Speech Recognition with Discrete Flow Matching
Aviv Navon
Aviv Shamsian
Neta Glazer
Yael Segal-Feldman
Gill Hetz
Joseph Keshet
Ethan Fetaya
130
0
0
05 Oct 2025
Speculative Actions: A Lossless Framework for Faster Agentic Systems
Speculative Actions: A Lossless Framework for Faster Agentic Systems
Naimeng Ye
Arnav Ahuja
Georgios Liargkovas
Yunan Lu
Kostis Kaffes
Tianyi Peng
188
3
0
05 Oct 2025
Self Speculative Decoding for Diffusion Large Language Models
Self Speculative Decoding for Diffusion Large Language Models
Yifeng Gao
Ziang Ji
Y. Wang
Biqing Qi
Hanlin Xu
Linfeng Zhang
DiffMLRM
320
5
0
05 Oct 2025
Beyond Next-Token Prediction: A Performance Characterization of Diffusion versus Autoregressive Language Models
Beyond Next-Token Prediction: A Performance Characterization of Diffusion versus Autoregressive Language Models
Minseo Kim
Coleman Hooper
Aditya Tomar
Chenfeng Xu
Mehrdad Farajtabar
Michael W. Mahoney
Kurt Keutzer
Amir Gholami
168
2
0
05 Oct 2025
Self-Speculative Masked Diffusions
Self-Speculative Masked Diffusions
Andrew Campbell
Valentin De Bortoli
Jiaxin Shi
Arnaud Doucet
DiffM
155
4
0
04 Oct 2025
Action Deviation-Aware Inference for Low-Latency Wireless Robots
Action Deviation-Aware Inference for Low-Latency Wireless Robots
Jeyoung Park
Yeonsub Lim
Seungeun Oh
Jihong Park
Jinho Choi
Seong-Lyun Kim
171
1
0
03 Oct 2025
The Disparate Impacts of Speculative Decoding
The Disparate Impacts of Speculative Decoding
Jameson Sandler
Ahmet Üstün
Marco Romanelli
Sara Hooker
Ferdinando Fioretto
132
1
0
02 Oct 2025
FlashResearch: Real-time Agent Orchestration for Efficient Deep Research
FlashResearch: Real-time Agent Orchestration for Efficient Deep Research
Lunyiu Nie
Nedim Lipka
Ryan Rossi
S. Chaudhuri
121
0
0
02 Oct 2025
Optimal Stopping vs Best-of-$N$ for Inference Time Optimization
Optimal Stopping vs Best-of-NNN for Inference Time Optimization
Y. Kalayci
Vinod Raman
S. Dughmi
122
0
0
01 Oct 2025
HiSpec: Hierarchical Speculative Decoding for LLMs
HiSpec: Hierarchical Speculative Decoding for LLMs
Avinash Kumar
Sujay Sanghavi
Poulami Das
119
1
0
01 Oct 2025
Free Draft-and-Verification: Toward Lossless Parallel Decoding for Diffusion Large Language Models
Free Draft-and-Verification: Toward Lossless Parallel Decoding for Diffusion Large Language Models
Shutong Wu
Jiawei Zhang
DiffM
314
2
0
30 Sep 2025
Previous
12345...141516
Next