Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2211.17192
Cited By
v1
v2 (latest)
Fast Inference from Transformers via Speculative Decoding
International Conference on Machine Learning (ICML), 2022
30 November 2022
Yaniv Leviathan
Matan Kalman
Yossi Matias
LRM
Re-assign community
ArXiv (abs)
PDF
HTML
HuggingFace (9 upvotes)
Papers citing
"Fast Inference from Transformers via Speculative Decoding"
13 / 763 papers shown
Parallel Sampling of Diffusion Models
Neural Information Processing Systems (NeurIPS), 2023
Andy Shih
Suneel Belkhale
Stefano Ermon
Dorsa Sadigh
Nima Anari
DiffM
429
99
0
25 May 2023
GQA: Training Generalized Multi-Query Transformer Models from Multi-Head Checkpoints
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Joshua Ainslie
James Lee-Thorp
Michiel de Jong
Yury Zemlyanskiy
Federico Lebrón
Sumit Sanghai
400
1,106
0
22 May 2023
ACRoBat: Optimizing Auto-batching of Dynamic Deep Learning at Compile Time
Conference on Machine Learning and Systems (MLSys), 2023
Pratik Fegade
Tianqi Chen
Phillip B. Gibbons
T. Mowry
214
3
0
17 May 2023
Accelerating Transformer Inference for Translation via Parallel Decoding
Annual Meeting of the Association for Computational Linguistics (ACL), 2023
Andrea Santilli
Silvio Severino
Emilian Postolache
Valentino Maiorca
Michele Mancusi
R. Marin
Emanuele Rodolà
266
117
0
17 May 2023
SpecInfer: Accelerating Generative Large Language Model Serving with Tree-based Speculative Inference and Verification
International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), 2023
Xupeng Miao
Xupeng Miao
Zhihao Zhang
Xinhao Cheng
Zeyu Wang
...
Chunan Shi
Zhuoming Chen
Daiyaan Arfeen
Reyna Abhyankar
Zhihao Jia
LRM
491
256
0
16 May 2023
FrugalGPT: How to Use Large Language Models While Reducing Cost and Improving Performance
Lingjiao Chen
Matei A. Zaharia
James Zou
LLMAG
403
395
0
09 May 2023
RAFT: Reward rAnked FineTuning for Generative Foundation Model Alignment
Hanze Dong
Wei Xiong
Deepanshu Goyal
Yihan Zhang
Winnie Chow
Boyao Wang
Shizhe Diao
Jipeng Zhang
Kashun Shum
Tong Zhang
ALM
463
636
0
13 Apr 2023
Jump to Conclusions: Short-Cutting Transformers With Linear Transformations
International Conference on Language Resources and Evaluation (LREC), 2023
Alexander Yom Din
Taelin Karidi
Leshem Choshen
Mor Geva
201
80
0
16 Mar 2023
Speculative Decoding with Big Little Decoder
Neural Information Processing Systems (NeurIPS), 2023
Sehoon Kim
K. Mangalam
Suhong Moon
Jitendra Malik
Michael W. Mahoney
A. Gholami
Kurt Keutzer
MoE
447
162
0
15 Feb 2023
Accelerating Large Language Model Decoding with Speculative Sampling
Charlie Chen
Sebastian Borgeaud
G. Irving
Jean-Baptiste Lespiau
Laurent Sifre
J. Jumper
BDL
LRM
330
667
0
02 Feb 2023
Speculative Decoding: Exploiting Speculative Execution for Accelerating Seq2seq Generation
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2022
Heming Xia
Tao Ge
Peiyi Wang
Si-Qing Chen
Furu Wei
Zhifang Sui
441
136
0
30 Mar 2022
Pretrained Language Models for Text Generation: A Survey
ACM Computing Surveys (ACM CSUR), 2022
Junyi Li
Tianyi Tang
Wayne Xin Zhao
J. Nie
Ji-Rong Wen
AI4CE
519
263
0
14 Jan 2022
Fast Transformer Decoding: One Write-Head is All You Need
Noam M. Shazeer
599
641
0
06 Nov 2019
Previous
1
2
3
...
14
15
16