ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2211.17192
  4. Cited By
Fast Inference from Transformers via Speculative Decoding
v1v2 (latest)

Fast Inference from Transformers via Speculative Decoding

International Conference on Machine Learning (ICML), 2022
30 November 2022
Yaniv Leviathan
Matan Kalman
Yossi Matias
    LRM
ArXiv (abs)PDFHTMLHuggingFace (9 upvotes)

Papers citing "Fast Inference from Transformers via Speculative Decoding"

50 / 763 papers shown
Uncertainty-Aware Hybrid Inference with On-Device Small and Remote Large Language Models
Uncertainty-Aware Hybrid Inference with On-Device Small and Remote Large Language Models
Seungeun Oh
Jinhyuk Kim
Jihong Park
Seung-Woo Ko
Tony Q. S. Quek
Seong-Lyun Kim
337
19
0
17 Dec 2024
RetroLLM: Empowering Large Language Models to Retrieve Fine-grained
  Evidence within Generation
RetroLLM: Empowering Large Language Models to Retrieve Fine-grained Evidence within GenerationAnnual Meeting of the Association for Computational Linguistics (ACL), 2024
Xiaochen Li
Jiajie Jin
Yujia Zhou
Yongkang Wu
Zhonghua Li
Qi Ye
Zhicheng Dou
RALMLRM
408
20
0
16 Dec 2024
NITRO: LLM Inference on Intel Laptop NPUs
NITRO: LLM Inference on Intel Laptop NPUs
Anthony Fei
Mohamed S. Abdelfattah
127
5
0
15 Dec 2024
Constrained Decoding with Speculative Lookaheads
Constrained Decoding with Speculative LookaheadsNorth American Chapter of the Association for Computational Linguistics (NAACL), 2024
Nishanth Nakshatri
Shamik Roy
Rajarshi Das
Suthee Chaidaroon
Leonid Boytsov
Rashmi Gangadharaiah
456
3
0
09 Dec 2024
CPTQuant -- A Novel Mixed Precision Post-Training Quantization
  Techniques for Large Language Models
CPTQuant -- A Novel Mixed Precision Post-Training Quantization Techniques for Large Language Models
Amitash Nanda
Sree Bhargavi Balija
D. Sahoo
MQ
269
4
0
03 Dec 2024
PLD+: Accelerating LLM inference by leveraging Language Model Artifacts
PLD+: Accelerating LLM inference by leveraging Language Model ArtifactsNorth American Chapter of the Association for Computational Linguistics (NAACL), 2024
Shwetha Somasundaram
Anirudh Phukan
Apoorv Saxena
370
9
0
02 Dec 2024
Neutralizing Backdoors through Information Conflicts for Large Language
  Models
Neutralizing Backdoors through Information Conflicts for Large Language Models
Chen Chen
Yuchen Sun
Xueluan Gong
Jiaxin Gao
K. Lam
KELMAAML
382
3
0
27 Nov 2024
Speculative Decoding with CTC-based Draft Model for LLM Inference AccelerationNeural Information Processing Systems (NeurIPS), 2024
Zhuofan Wen
Shangtong Gui
Yang Feng
407
9
0
25 Nov 2024
Closer Look at Efficient Inference Methods: A Survey of Speculative
  Decoding
Closer Look at Efficient Inference Methods: A Survey of Speculative Decoding
Hyun Ryu
Eric Kim
358
3
0
20 Nov 2024
Search, Verify and Feedback: Towards Next Generation Post-training Paradigm of Foundation Models via Verifier Engineering
Xinyan Guan
Yanjiang Liu
Xinyu Lu
Boxi Cao
Xianpei Han
...
Le Sun
Jie Lou
Bowen Yu
Yaojie Lu
Hongyu Lin
ALM
590
9
0
18 Nov 2024
Debiasing Watermarks for Large Language Models via Maximal CouplingJournal of the American Statistical Association (JASA), 2024
Yangxinyu Xie
Xiang Li
Tanwi Mallick
Weijie J. Su
Ruixun Zhang
WaLM
355
12
0
17 Nov 2024
SAM Decoding: Speculative Decoding via Suffix Automaton
SAM Decoding: Speculative Decoding via Suffix AutomatonAnnual Meeting of the Association for Computational Linguistics (ACL), 2024
Yuxuan Hu
Ke Wang
Jing Zhang
Fanjin Zhang
Xuefei Liu
Zeyang Zhang
Jing Zhang
478
18
0
16 Nov 2024
SpecHub: Provable Acceleration to Multi-Draft Speculative Decoding
SpecHub: Provable Acceleration to Multi-Draft Speculative DecodingConference on Empirical Methods in Natural Language Processing (EMNLP), 2024
Ryan Sun
Tianyi Zhou
Xun Chen
Lichao Sun
213
7
0
08 Nov 2024
SSSD: Simply-Scalable Speculative Decoding
SSSD: Simply-Scalable Speculative Decoding
Michele Marzollo
Jiawei Zhuang
Niklas Roemer
Lorenz K. Müller
Lukas Cavigelli
Lukas Cavigelli
LRM
341
1
0
08 Nov 2024
SuffixDecoding: Extreme Speculative Decoding for Emerging AI Applications
SuffixDecoding: Extreme Speculative Decoding for Emerging AI Applications
Xupeng Miao
Zhihao Jia
Daniel F Campos
Aurick Qiao
LRM
350
4
0
07 Nov 2024
The N-Grammys: Accelerating Autoregressive Inference with Learning-Free
  Batched Speculation
The N-Grammys: Accelerating Autoregressive Inference with Learning-Free Batched Speculation
Lawrence Stewart
Matthew Trager
Sujan Kumar Gonugondla
Stefano Soatto
223
10
0
06 Nov 2024
When Speculation Spills Secrets: Side Channels via Speculative Decoding In LLMs
When Speculation Spills Secrets: Side Channels via Speculative Decoding In LLMs
Jiankun Wei
Abdulrahman Abdulrazzag
Tianchen Zhang
Adel Muursepp
Gururaj Saileshwar
425
4
0
01 Nov 2024
Interpretable Next-token Prediction via the Generalized Induction Head
Interpretable Next-token Prediction via the Generalized Induction Head
Eunji Kim
Sriya Mantena
Weiwei Yang
Chandan Singh
Sungroh Yoon
Jianfeng Gao
371
1
0
31 Oct 2024
Accelerated AI Inference via Dynamic Execution Methods
Accelerated AI Inference via Dynamic Execution Methods
Haim Barad
Jascha Achterberg
Tien Pei Chou
Jean Yu
249
1
0
30 Oct 2024
A Theoretical Perspective for Speculative Decoding Algorithm
A Theoretical Perspective for Speculative Decoding AlgorithmNeural Information Processing Systems (NeurIPS), 2024
Ming Yin
Minshuo Chen
Kaixuan Huang
Mengdi Wang
217
20
0
30 Oct 2024
The Impact of Inference Acceleration on Bias of LLMs
The Impact of Inference Acceleration on Bias of LLMsNorth American Chapter of the Association for Computational Linguistics (NAACL), 2024
Elisabeth Kirsten
Ivan Habernal
Vedant Nanda
Muhammad Bilal Zafar
356
0
0
29 Oct 2024
Fast and High-Quality Auto-Regressive Speech Synthesis via Speculative Decoding
Fast and High-Quality Auto-Regressive Speech Synthesis via Speculative DecodingIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2024
Bohan Li
Hankun Wang
Situo Zhang
Yiwei Guo
Kai Yu
318
12
0
29 Oct 2024
ProMoE: Fast MoE-based LLM Serving using Proactive Caching
ProMoE: Fast MoE-based LLM Serving using Proactive Caching
Xiaoniu Song
Zihang Zhong
Rong Chen
Haibo Chen
MoE
488
20
0
29 Oct 2024
Meta-Learning for Speeding Up Large Model Inference in Decentralized
  Environments
Meta-Learning for Speeding Up Large Model Inference in Decentralized Environments
Yuzhe Yang
Yipeng Du
Ahmad Farhan
Claudio Angione
Yue Zhao
Harry Yang
Fielding Johnston
James Buban
Patrick Colangelo
296
0
0
28 Oct 2024
Transferable Post-training via Inverse Value Learning
Transferable Post-training via Inverse Value LearningNorth American Chapter of the Association for Computational Linguistics (NAACL), 2024
Xinyu Lu
Xueru Wen
Yaojie Lu
Bowen Yu
Hongyu Lin
Haiyang Yu
Le Sun
Jia Zheng
Yongbin Li
239
1
0
28 Oct 2024
ShadowKV: KV Cache in Shadows for High-Throughput Long-Context LLM Inference
ShadowKV: KV Cache in Shadows for High-Throughput Long-Context LLM Inference
Hanshi Sun
Li-Wen Chang
Yiyuan Ma
Wenlei Bao
Ningxin Zheng
Xin Liu
Harry Dong
Yuejie Chi
Beidi Chen
VLM
459
55
0
28 Oct 2024
Relaxed Recursive Transformers: Effective Parameter Sharing with Layer-wise LoRA
Relaxed Recursive Transformers: Effective Parameter Sharing with Layer-wise LoRAInternational Conference on Learning Representations (ICLR), 2024
Sangmin Bae
Adam Fisch
Hrayr Harutyunyan
Ziwei Ji
Seungyeon Kim
Tal Schuster
KELM
396
20
0
28 Oct 2024
FIRP: Faster LLM inference via future intermediate representation
  prediction
FIRP: Faster LLM inference via future intermediate representation predictionNatural Language Processing and Chinese Computing (NLPCC), 2024
Pengfei Wu
Jiahao Liu
Zhuocheng Gong
Qifan Wang
Jinpeng Li
Jingang Wang
Xunliang Cai
Dongyan Zhao
AI4CE
107
0
0
27 Oct 2024
Fast Best-of-N Decoding via Speculative Rejection
Fast Best-of-N Decoding via Speculative RejectionNeural Information Processing Systems (NeurIPS), 2024
Hanshi Sun
Momin Haider
Ruiqi Zhang
Huitao Yang
Jiahao Qiu
Ming Yin
Mengdi Wang
Peter L. Bartlett
Andrea Zanette
BDL
378
101
0
26 Oct 2024
Dynamic layer selection in decoder-only transformers
Dynamic layer selection in decoder-only transformers
Theodore Glavas
Joud Chataoui
Florence Regol
Wassim Jabbour
Antonios Valkanas
Boris N. Oreshkin
Mark Coates
AI4CE
288
2
0
26 Oct 2024
Watermarking Large Language Models and the Generated Content:
  Opportunities and Challenges
Watermarking Large Language Models and the Generated Content: Opportunities and ChallengesAsilomar Conference on Signals, Systems and Computers (ACSSC), 2024
Ruisi Zhang
F. Koushanfar
WaLM
291
3
0
24 Oct 2024
AdaEDL: Early Draft Stopping for Speculative Decoding of Large Language
  Models via an Entropy-based Lower Bound on Token Acceptance Probability
AdaEDL: Early Draft Stopping for Speculative Decoding of Large Language Models via an Entropy-based Lower Bound on Token Acceptance Probability
Sudhanshu Agrawal
Wonseok Jeon
Mingu Lee
144
10
0
24 Oct 2024
Multi-Draft Speculative Sampling: Canonical Decomposition and Theoretical Limits
Multi-Draft Speculative Sampling: Canonical Decomposition and Theoretical LimitsInternational Conference on Learning Representations (ICLR), 2024
Ashish Khisti
MohammadReza Ebrahimi
Hassan Dbouk
Arash Behboodi
Roland Memisevic
Christos Louizos
334
2
0
23 Oct 2024
Faster Language Models with Better Multi-Token Prediction Using Tensor Decomposition
Faster Language Models with Better Multi-Token Prediction Using Tensor Decomposition
Artem Basharin
Andrei Chertkov
Ivan Oseledets
407
3
0
23 Oct 2024
Coarse-to-Fine Highlighting: Reducing Knowledge Hallucination in Large
  Language Models
Coarse-to-Fine Highlighting: Reducing Knowledge Hallucination in Large Language ModelsInternational Conference on Machine Learning (ICML), 2024
Qitan Lv
Jie Wang
Hanzhu Chen
Bin Li
Yongdong Zhang
Feng Wu
HILM
344
11
0
19 Oct 2024
MoDification: Mixture of Depths Made Easy
MoDification: Mixture of Depths Made EasyNorth American Chapter of the Association for Computational Linguistics (NAACL), 2024
C. Zhang
M. Zhong
Qimeng Wang
Xuantao Lu
Zheyu Ye
...
Yan Gao
Yao Hu
Kehai Chen
Min Zhang
Dawei Song
VLMMoE
204
2
0
18 Oct 2024
TreeBoN: Enhancing Inference-Time Alignment with Speculative Tree-Search and Best-of-N Sampling
TreeBoN: Enhancing Inference-Time Alignment with Speculative Tree-Search and Best-of-N Sampling
Jiahao Qiu
Yifu Lu
Yifan Zeng
Jiacheng Guo
Jiayi Geng
...
Ling Yang
Mengdi Wang
Kaixuan Huang
Yue Wu
Mengdi Wang
489
50
0
18 Oct 2024
Accelerating Codec-based Speech Synthesis with Multi-Token Prediction
  and Speculative Decoding
Accelerating Codec-based Speech Synthesis with Multi-Token Prediction and Speculative DecodingIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2024
Tan Dat Nguyen
Ji-Hoon Kim
Jeongsoo Choi
Shukjae Choi
Jinseok Park
Younglo Lee
Joon Son Chung
272
8
0
17 Oct 2024
Cerberus: Efficient Inference with Adaptive Parallel Decoding and
  Sequential Knowledge Enhancement
Cerberus: Efficient Inference with Adaptive Parallel Decoding and Sequential Knowledge Enhancement
Yuxuan Liu
Wenyuan Li
Laizhong Cui
Hailiang Yang
OffRL
139
1
0
17 Oct 2024
Learning to Route LLMs with Confidence Tokens
Learning to Route LLMs with Confidence Tokens
Yu-Neng Chuang
Helen Zhou
Prathusha Kameswara Sarma
Parikshit Gopalan
John Boccio
Sara Bolouki
Helen Zhou
287
0
0
17 Oct 2024
DySpec: Faster Speculative Decoding with Dynamic Token Tree Structure
DySpec: Faster Speculative Decoding with Dynamic Token Tree Structure
Yunfan Xiong
Ruoyu Zhang
Yanzeng Li
Tianhao Wu
Lei Zou
199
13
0
15 Oct 2024
Learning from Imperfect Data: Towards Efficient Knowledge Distillation
  of Autoregressive Language Models for Text-to-SQL
Learning from Imperfect Data: Towards Efficient Knowledge Distillation of Autoregressive Language Models for Text-to-SQLConference on Empirical Methods in Natural Language Processing (EMNLP), 2024
Qihuang Zhong
Kunfeng Chen
Liang Ding
Juhua Liu
Di Lin
Dacheng Tao
163
1
0
15 Oct 2024
QSpec: Speculative Decoding with Complementary Quantization Schemes
QSpec: Speculative Decoding with Complementary Quantization Schemes
Juntao Zhao
Wenhao Lu
Sheng Wang
Lingpeng Kong
Chuan Wu
MQ
440
11
0
15 Oct 2024
Speculative Knowledge Distillation: Bridging the Teacher-Student Gap Through Interleaved Sampling
Speculative Knowledge Distillation: Bridging the Teacher-Student Gap Through Interleaved Sampling
Wenyuan Xu
Rujun Han
Zhenting Wang
L. Le
Dhruv Madeka
Lei Li
Wenjie Wang
Rishabh Agarwal
Zifeng Wang
Tomas Pfister
598
27
0
15 Oct 2024
Customize Your Visual Autoregressive Recipe with Set Autoregressive
  Modeling
Customize Your Visual Autoregressive Recipe with Set Autoregressive Modeling
Wenze Liu
Le Zhuo
Yi Xin
Sheng Xia
Peng Gao
Xiangyu Yue
227
17
0
14 Oct 2024
Probabilistic Degeneracy Detection for Point-to-Plane Error Minimization
Probabilistic Degeneracy Detection for Point-to-Plane Error MinimizationIEEE Robotics and Automation Letters (RA-L), 2024
Johan Hatleskog
Kostas Alexis
3DPC
406
9
0
14 Oct 2024
Self-Data Distillation for Recovering Quality in Pruned Large Language Models
Self-Data Distillation for Recovering Quality in Pruned Large Language Models
Vithursan Thangarasa
Ganesh Venkatesh
Mike Lasby
Nish Sinnadurai
Sean Lie
SyDa
493
4
0
13 Oct 2024
COrAL: Order-Agnostic Language Modeling for Efficient Iterative
  Refinement
COrAL: Order-Agnostic Language Modeling for Efficient Iterative Refinement
Yuxi Xie
Anirudh Goyal
Xiaobao Wu
Xunjian Yin
Xiao Xu
Min-Yen Kan
Liangming Pan
William Yang Wang
LRM
895
1
0
12 Oct 2024
QEFT: Quantization for Efficient Fine-Tuning of LLMs
QEFT: Quantization for Efficient Fine-Tuning of LLMsConference on Empirical Methods in Natural Language Processing (EMNLP), 2024
Changhun Lee
Jun-gyu Jin
Jun-gyu Jin
Eunhyeok Park
MQ
214
4
0
11 Oct 2024
KV Prediction for Improved Time to First Token
KV Prediction for Improved Time to First Token
Maxwell Horton
Qingqing Cao
Chenfan Sun
Yanzi Jin
Sachin Mehta
Mohammad Rastegari
Moin Nabi
AI4TS
240
8
0
10 Oct 2024
Previous
123...8910...141516
Next
Page 9 of 16
Pageof 16