Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2211.17192
Cited By
v1
v2 (latest)
Fast Inference from Transformers via Speculative Decoding
International Conference on Machine Learning (ICML), 2022
30 November 2022
Yaniv Leviathan
Matan Kalman
Yossi Matias
LRM
Re-assign community
ArXiv (abs)
PDF
HTML
HuggingFace (9 upvotes)
Papers citing
"Fast Inference from Transformers via Speculative Decoding"
50 / 763 papers shown
Hot PATE: Private Aggregation of Distributions for Diverse Task
Edith Cohen
Benjamin Cohen-Wang
Xin Lyu
Jelani Nelson
Tamas Sarlos
Uri Stemmer
523
4
0
04 Dec 2023
TextGenSHAP: Scalable Post-hoc Explanations in Text Generation with Long Documents
Annual Meeting of the Association for Computational Linguistics (ACL), 2023
James Enouen
Hootan Nakhost
Sayna Ebrahimi
Sercan O. Arik
Yan Liu
Tomas Pfister
337
14
0
03 Dec 2023
ChatGPT's One-year Anniversary: Are Open-Source Large Language Models Catching up?
Hailin Chen
Fangkai Jiao
Xingxuan Li
Chengwei Qin
Mathieu Ravaut
Ruochen Zhao
Caiming Xiong
Shafiq Joty
ELM
CLL
AI4MH
LRM
ALM
361
31
0
28 Nov 2023
PaSS: Parallel Speculative Sampling
Giovanni Monea
Armand Joulin
Edouard Grave
MoE
219
45
0
22 Nov 2023
HexGen: Generative Inference of Large Language Model over Heterogeneous Environment
Youhe Jiang
Ran Yan
Xiaozhe Yao
Yang Zhou
Beidi Chen
Binhang Yuan
SyDa
224
32
0
20 Nov 2023
Speculative Contrastive Decoding
Annual Meeting of the Association for Computational Linguistics (ACL), 2023
Hongyi Yuan
Keming Lu
Fei Huang
Zheng Yuan
Chang Zhou
165
8
0
15 Nov 2023
Fast Chain-of-Thought: A Glance of Future from Parallel Decoding Leads to Answers Faster
Hongxuan Zhang
Zhining Liu
Yao Zhao
Jiaqi Zheng
Chenyi Zhuang
Jinjie Gu
Guihai Chen
LRM
MLLM
217
2
0
14 Nov 2023
REST: Retrieval-Based Speculative Decoding
North American Chapter of the Association for Computational Linguistics (NAACL), 2023
Zhenyu He
Zexuan Zhong
Tianle Cai
Jason D. Lee
Di He
RALM
294
121
0
14 Nov 2023
JARVIS-1: Open-World Multi-task Agents with Memory-Augmented Multimodal Language Models
IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2023
Zihao Wang
Shaofei Cai
Hoang Trung-Dung
Yonggang Jin
Jinbing Hou
...
Zhaofeng He
Zilong Zheng
Yaodong Yang
Xiaojian Ma
Yitao Liang
LLMAG
LM&Ro
373
156
0
10 Nov 2023
Improving Machine Translation with Large Language Models: A Preliminary Study with Cooperative Decoding
Annual Meeting of the Association for Computational Linguistics (ACL), 2023
Jiali Zeng
Fandong Meng
Yongjing Yin
Jie Zhou
278
14
0
06 Nov 2023
GQKVA: Efficient Pre-training of Transformers by Grouping Queries, Keys, and Values
Farnoosh Javadi
Walid Ahmed
Habib Hajimolahoseini
Foozhan Ataiefard
Mohammad Hassanpour
Saina Asani
Austin Wen
Omar Mohamed Awad
Kangling Liu
Yang Liu
VLM
303
8
0
06 Nov 2023
Divergent Token Metrics: Measuring degradation to prune away LLM components -- and optimize quantization
North American Chapter of the Association for Computational Linguistics (NAACL), 2023
Bjorn Deiseroth
Max Meuer
Nikolas Gritsch
C. Eichenberg
P. Schramowski
Matthias Aßenmacher
Kristian Kersting
66
3
0
02 Nov 2023
Distil-Whisper: Robust Knowledge Distillation via Large-Scale Pseudo Labelling
Sanchit Gandhi
Patrick von Platen
Alexander M. Rush
VLM
340
104
0
01 Nov 2023
The Synergy of Speculative Decoding and Batching in Serving Large Language Models
Qidong Su
Christina Giannoula
Gennady Pekhimenko
169
18
0
28 Oct 2023
Punica: Multi-Tenant LoRA Serving
Conference on Machine Learning and Systems (MLSys), 2023
Lequn Chen
Zihao Ye
Yongji Wu
Danyang Zhuo
Luis Ceze
Arvind Krishnamurthy
218
62
0
28 Oct 2023
Controlled Decoding from Language Models
International Conference on Machine Learning (ICML), 2023
Sidharth Mudgal
Jong Lee
H. Ganapathy
Yaguang Li
Tao Wang
...
Michael Collins
Trevor Strohman
Jilin Chen
Alex Beutel
Ahmad Beirami
463
113
0
25 Oct 2023
QMoE: Practical Sub-1-Bit Compression of Trillion-Parameter Models
Elias Frantar
Dan Alistarh
MQ
MoE
260
37
0
25 Oct 2023
SpecTr: Fast Speculative Decoding via Optimal Transport
Neural Information Processing Systems (NeurIPS), 2023
Ziteng Sun
A. Suresh
Jae Hun Ro
Ahmad Beirami
Himanshu Jain
Felix X. Yu
329
117
0
23 Oct 2023
Large Search Model: Redefining Search Stack in the Era of LLMs
Liang Wang
Nan Yang
Xiaolong Huang
Linjun Yang
Rangan Majumder
Furu Wei
LRM
KELM
227
25
0
23 Oct 2023
An Emulator for Fine-Tuning Large Language Models using Small Language Models
Eric Mitchell
Rafael Rafailov
Archit Sharma
Chelsea Finn
Christopher D. Manning
ALM
303
65
0
19 Oct 2023
SPEED: Speculative Pipelined Execution for Efficient Decoding
Coleman Hooper
Sehoon Kim
Hiva Mohammadzadeh
Hasan Genç
Kurt Keutzer
A. Gholami
Y. Shao
203
48
0
18 Oct 2023
BitNet: Scaling 1-bit Transformers for Large Language Models
Hongyu Wang
Shuming Ma
Li Dong
Shaohan Huang
Huaijie Wang
Lingxiao Ma
Fan Yang
Ruiping Wang
Yi Wu
Furu Wei
MQ
223
185
0
17 Oct 2023
Enhanced Transformer Architecture for Natural Language Processing
Pacific Asia Conference on Language, Information and Computation (PACLIC), 2023
Woohyeon Moon
Taeyoung Kim
Bumgeun Park
Dongsoo Har
226
0
0
17 Oct 2023
QUIK: Towards End-to-End 4-Bit Inference on Generative Large Language Models
Saleh Ashkboos
Ilia Markov
Elias Frantar
Tingxuan Zhong
Xincheng Wang
Jie Ren
Torsten Hoefler
Dan Alistarh
MQ
SyDa
357
35
0
13 Oct 2023
Tree-Planner: Efficient Close-loop Task Planning with Large Language Models
International Conference on Learning Representations (ICLR), 2023
Mengkang Hu
Yao Mu
Xinmiao Yu
Mingyu Ding
Shiguang Wu
Wenqi Shao
Qiguang Chen
Bin Wang
Yu Qiao
Ping Luo
LLMAG
226
51
0
12 Oct 2023
DistillSpec: Improving Speculative Decoding via Knowledge Distillation
International Conference on Learning Representations (ICLR), 2023
Yongchao Zhou
Kaifeng Lyu
A. S. Rawat
A. Menon
Afshin Rostamizadeh
Sanjiv Kumar
Jean-François Kagy
Rishabh Agarwal
266
123
0
12 Oct 2023
MatFormer: Nested Transformer for Elastic Inference
Neural Information Processing Systems (NeurIPS), 2023
Devvrit
Sneha Kudugunta
Aditya Kusupati
Tim Dettmers
Kaifeng Chen
...
Yulia Tsvetkov
Hannaneh Hajishirzi
Sham Kakade
Ali Farhadi
Prateek Jain
255
61
0
11 Oct 2023
CacheGen: KV Cache Compression and Streaming for Fast Language Model Serving
Conference on Applications, Technologies, Architectures, and Protocols for Computer Communication (SIGCOMM), 2023
Yuhan Liu
Hanchen Li
Yihua Cheng
Siddhant Ray
Yuyang Huang
...
Ganesh Ananthanarayanan
Michael Maire
Henry Hoffmann
Ari Holtzman
Junchen Jiang
566
141
0
11 Oct 2023
Online Speculative Decoding
International Conference on Machine Learning (ICML), 2023
Xiaoxuan Liu
Lanxiang Hu
Peter Bailis
Alvin Cheung
Zhijie Deng
Ion Stoica
Hao Zhang
393
84
0
11 Oct 2023
CoQuest: Exploring Research Question Co-Creation with an LLM-based Agent
International Conference on Human Factors in Computing Systems (CHI), 2023
Yiren Liu
Si Chen
Haocong Cheng
Mengxia Yu
Xiao Ran
Andrew Mo
Yiliu Tang
Yun Huang
LLMAG
336
75
0
09 Oct 2023
Fast and Robust Early-Exiting Framework for Autoregressive Language Models with Synchronized Parallel Decoding
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Sangmin Bae
Jongwoo Ko
Hwanjun Song
SeYoung Yun
270
78
0
09 Oct 2023
ReLU Strikes Back: Exploiting Activation Sparsity in Large Language Models
International Conference on Learning Representations (ICLR), 2023
Iman Mirzadeh
Keivan Alizadeh-Vahid
Sachin Mehta
C. C. D. Mundo
Oncel Tuzel
Golnoosh Samei
Mohammad Rastegari
Mehrdad Farajtabar
490
100
0
06 Oct 2023
DirectGPT: A Direct Manipulation Interface to Interact with Large Language Models
International Conference on Human Factors in Computing Systems (CHI), 2023
Damien Masson
Sylvain Malacria
Géry Casiez
Daniel Vogel
AI4CE
KELM
MLLM
255
69
0
05 Oct 2023
Large Language Model Cascades with Mixture of Thoughts Representations for Cost-efficient Reasoning
International Conference on Learning Representations (ICLR), 2023
Murong Yue
Jie Zhao
Min Zhang
Liang Du
Ziyu Yao
LRM
351
118
0
04 Oct 2023
Alphazero-like Tree-Search can Guide Large Language Model Decoding and Training
International Conference on Machine Learning (ICML), 2023
Xidong Feng
Bo Liu
Muning Wen
Alexander Shmakov
Ying Wen
Weinan Zhang
Jun Wang
LRM
AI4CE
261
286
0
29 Sep 2023
Pushing Large Language Models to the 6G Edge: Vision, Challenges, and Opportunities
IEEE Communications Magazine (IEEE Commun. Mag.), 2023
Zhengyi Lin
Guanqiao Qu
Qiyuan Chen
Randy Sarayar
Zhe Chen
Kaibin Huang
493
150
0
28 Sep 2023
Navigate through Enigmatic Labyrinth A Survey of Chain of Thought Reasoning: Advances, Frontiers and Future
Annual Meeting of the Association for Computational Linguistics (ACL), 2023
Zheng Chu
Jingchang Chen
Qianglong Chen
Weijiang Yu
Tao He
Haotian Wang
Weihua Peng
Ming-Yuan Liu
Bing Qin
Ting Liu
LRM
AI4CE
493
222
0
27 Sep 2023
LMDX: Language Model-based Document Information Extraction and Localization
Annual Meeting of the Association for Computational Linguistics (ACL), 2023
Vincent Perot
Kai Kang
Florian Luisier
Guolong Su
Xiaoyu Sun
...
Zifeng Wang
Jiaqi Mu
Hao Zhang
Chen-Yu Lee
Nan Hua
228
52
0
19 Sep 2023
LLMCad: Fast and Scalable On-device Large Language Model Inference
Daliang Xu
Wangsong Yin
Xin Jin
Yanzhe Zhang
Shiyun Wei
Mengwei Xu
Xuanzhe Liu
207
70
0
08 Sep 2023
SortedNet: A Scalable and Generalized Framework for Training Modular Deep Neural Networks
Mojtaba Valipour
Mehdi Rezagholizadeh
Hossein Rajabzadeh
Parsa Kavehzadeh
Marzieh S. Tahaei
Boxing Chen
Ali Ghodsi
133
2
0
01 Sep 2023
Uncertainty Estimation of Transformers' Predictions via Topological Analysis of the Attention Matrices
Elizaveta Kostenok
D. Cherniavskii
Alexey Zaytsev
249
9
0
22 Aug 2023
Accelerating LLM Inference with Staged Speculative Decoding
Benjamin Spector
Christal Re
270
150
0
08 Aug 2023
RecycleGPT: An Autoregressive Language Model with Recyclable Module
Yu Jiang
Qiaozhi He
Xiaomin Zhuang
Zhihua Wu
Kunpeng Wang
Wenlai Zhao
Guangwen Yang
KELM
275
3
0
07 Aug 2023
Predictive Pipelined Decoding: A Compute-Latency Trade-off for Exact LLM Decoding
Seongjun Yang
Gibbeum Lee
Jaewoong Cho
Dimitris Papailiopoulos
Kangwook Lee
224
46
0
12 Jul 2023
Query Understanding in the Age of Large Language Models
Avishek Anand
Venktesh V
Abhijit Anand
Vinay Setty
LRM
259
9
0
28 Jun 2023
LMFlow: An Extensible Toolkit for Finetuning and Inference of Large Foundation Models
North American Chapter of the Association for Computational Linguistics (NAACL), 2023
Shizhe Diao
Boyao Wang
Hanze Dong
Kashun Shum
Jipeng Zhang
Wei Xiong
Tong Zhang
ALM
297
76
0
21 Jun 2023
GLIMMER: generalized late-interaction memory reranker
Michiel de Jong
Yury Zemlyanskiy
Nicholas FitzGerald
Sumit Sanghai
William W. Cohen
Joshua Ainslie
RALM
232
9
0
17 Jun 2023
On Optimal Caching and Model Multiplexing for Large Model Inference
Banghua Zhu
Ying Sheng
Lianmin Zheng
Clark W. Barrett
Sai Li
Jiantao Jiao
306
28
0
03 Jun 2023
Exploring the Practicality of Generative Retrieval on Dynamic Corpora
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Soyoung Yoon
Chaeeun Kim
Hyunji Lee
Joel Jang
Sohee Yang
Minjoon Seo
319
6
0
27 May 2023
Large Language Models as Tool Makers
International Conference on Learning Representations (ICLR), 2023
Tianle Cai
Xuezhi Wang
Tengyu Ma
Xinyun Chen
Denny Zhou
LLMAG
279
262
0
26 May 2023
Previous
1
2
3
...
14
15
16
Next