ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2211.17192
  4. Cited By
Fast Inference from Transformers via Speculative Decoding
v1v2 (latest)

Fast Inference from Transformers via Speculative Decoding

International Conference on Machine Learning (ICML), 2022
30 November 2022
Yaniv Leviathan
Matan Kalman
Yossi Matias
    LRM
ArXiv (abs)PDFHTMLHuggingFace (9 upvotes)

Papers citing "Fast Inference from Transformers via Speculative Decoding"

13 / 763 papers shown
Parallel Sampling of Diffusion Models
Parallel Sampling of Diffusion ModelsNeural Information Processing Systems (NeurIPS), 2023
Andy Shih
Suneel Belkhale
Stefano Ermon
Dorsa Sadigh
Nima Anari
DiffM
429
99
0
25 May 2023
GQA: Training Generalized Multi-Query Transformer Models from Multi-Head
  Checkpoints
GQA: Training Generalized Multi-Query Transformer Models from Multi-Head CheckpointsConference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Joshua Ainslie
James Lee-Thorp
Michiel de Jong
Yury Zemlyanskiy
Federico Lebrón
Sumit Sanghai
400
1,106
0
22 May 2023
ACRoBat: Optimizing Auto-batching of Dynamic Deep Learning at Compile
  Time
ACRoBat: Optimizing Auto-batching of Dynamic Deep Learning at Compile TimeConference on Machine Learning and Systems (MLSys), 2023
Pratik Fegade
Tianqi Chen
Phillip B. Gibbons
T. Mowry
214
3
0
17 May 2023
Accelerating Transformer Inference for Translation via Parallel Decoding
Accelerating Transformer Inference for Translation via Parallel DecodingAnnual Meeting of the Association for Computational Linguistics (ACL), 2023
Andrea Santilli
Silvio Severino
Emilian Postolache
Valentino Maiorca
Michele Mancusi
R. Marin
Emanuele Rodolà
266
117
0
17 May 2023
SpecInfer: Accelerating Generative Large Language Model Serving with
  Tree-based Speculative Inference and Verification
SpecInfer: Accelerating Generative Large Language Model Serving with Tree-based Speculative Inference and VerificationInternational Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), 2023
Xupeng Miao
Xupeng Miao
Zhihao Zhang
Xinhao Cheng
Zeyu Wang
...
Chunan Shi
Zhuoming Chen
Daiyaan Arfeen
Reyna Abhyankar
Zhihao Jia
LRM
491
256
0
16 May 2023
FrugalGPT: How to Use Large Language Models While Reducing Cost and
  Improving Performance
FrugalGPT: How to Use Large Language Models While Reducing Cost and Improving Performance
Lingjiao Chen
Matei A. Zaharia
James Zou
LLMAG
403
395
0
09 May 2023
RAFT: Reward rAnked FineTuning for Generative Foundation Model Alignment
RAFT: Reward rAnked FineTuning for Generative Foundation Model Alignment
Hanze Dong
Wei Xiong
Deepanshu Goyal
Yihan Zhang
Winnie Chow
Boyao Wang
Shizhe Diao
Jipeng Zhang
Kashun Shum
Tong Zhang
ALM
463
636
0
13 Apr 2023
Jump to Conclusions: Short-Cutting Transformers With Linear
  Transformations
Jump to Conclusions: Short-Cutting Transformers With Linear TransformationsInternational Conference on Language Resources and Evaluation (LREC), 2023
Alexander Yom Din
Taelin Karidi
Leshem Choshen
Mor Geva
201
80
0
16 Mar 2023
Speculative Decoding with Big Little Decoder
Speculative Decoding with Big Little DecoderNeural Information Processing Systems (NeurIPS), 2023
Sehoon Kim
K. Mangalam
Suhong Moon
Jitendra Malik
Michael W. Mahoney
A. Gholami
Kurt Keutzer
MoE
447
162
0
15 Feb 2023
Accelerating Large Language Model Decoding with Speculative Sampling
Accelerating Large Language Model Decoding with Speculative Sampling
Charlie Chen
Sebastian Borgeaud
G. Irving
Jean-Baptiste Lespiau
Laurent Sifre
J. Jumper
BDLLRM
330
667
0
02 Feb 2023
Speculative Decoding: Exploiting Speculative Execution for Accelerating
  Seq2seq Generation
Speculative Decoding: Exploiting Speculative Execution for Accelerating Seq2seq GenerationConference on Empirical Methods in Natural Language Processing (EMNLP), 2022
Heming Xia
Tao Ge
Peiyi Wang
Si-Qing Chen
Furu Wei
Zhifang Sui
441
136
0
30 Mar 2022
Pretrained Language Models for Text Generation: A Survey
Pretrained Language Models for Text Generation: A SurveyACM Computing Surveys (ACM CSUR), 2022
Junyi Li
Tianyi Tang
Wayne Xin Zhao
J. Nie
Ji-Rong Wen
AI4CE
519
263
0
14 Jan 2022
Fast Transformer Decoding: One Write-Head is All You Need
Fast Transformer Decoding: One Write-Head is All You Need
Noam M. Shazeer
599
641
0
06 Nov 2019
Previous
123...141516