Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2302.07863
Cited By
v1
v2
v3
v4 (latest)
Speculative Decoding with Big Little Decoder
Neural Information Processing Systems (NeurIPS), 2023
15 February 2023
Sehoon Kim
K. Mangalam
Suhong Moon
Jitendra Malik
Michael W. Mahoney
A. Gholami
Kurt Keutzer
MoE
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Speculative Decoding with Big Little Decoder"
50 / 103 papers shown
Title
QuantSpec: Self-Speculative Decoding with Hierarchical Quantized KV Cache
Rishabh Tiwari
Haocheng Xi
Aditya Tomar
Coleman Hooper
Sehoon Kim
Maxwell Horton
Mahyar Najibi
Michael W. Mahoney
Kemal Kurniawan
Amir Gholami
MQ
193
9
0
05 Feb 2025
Judge Decoding: Faster Speculative Sampling Requires Going Beyond Model Alignment
International Conference on Learning Representations (ICLR), 2025
Gregor Bachmann
Sotiris Anagnostidis
Albert Pumarola
Markos Georgopoulos
A. Sanakoyeu
Yuming Du
Edgar Schönfeld
Ali K. Thabet
Jonas Kohler
ALM
BDL
350
28
0
31 Jan 2025
Falcon: Faster and Parallel Inference of Large Language Models through Enhanced Semi-Autoregressive Drafting and Custom-Designed Decoding Tree
AAAI Conference on Artificial Intelligence (AAAI), 2024
Xiangxiang Gao
Weisheng Xie
Yiwei Xiang
Feng Ji
450
14
0
17 Dec 2024
Constrained Decoding with Speculative Lookaheads
North American Chapter of the Association for Computational Linguistics (NAACL), 2024
Nishanth Nakshatri
Shamik Roy
Rajarshi Das
Suthee Chaidaroon
Leonid Boytsov
Rashmi Gangadharaiah
410
3
0
09 Dec 2024
Software Performance Engineering for Foundation Model-Powered Software (FMware)
Haoxiang Zhang
Shi Chang
Arthur Leung
Kishanthan Thangarajah
Boyuan Chen
Hanan Lutfiyya
Ahmed E. Hassan
552
2
0
14 Nov 2024
When Speculation Spills Secrets: Side Channels via Speculative Decoding In LLMs
Jiankun Wei
Abdulrahman Abdulrazzag
Tianchen Zhang
Adel Muursepp
Gururaj Saileshwar
345
4
0
01 Nov 2024
A Theoretical Perspective for Speculative Decoding Algorithm
Neural Information Processing Systems (NeurIPS), 2024
Ming Yin
Minshuo Chen
Kaixuan Huang
Mengdi Wang
180
20
0
30 Oct 2024
Watermarking Large Language Models and the Generated Content: Opportunities and Challenges
Asilomar Conference on Signals, Systems and Computers (ACSSC), 2024
Ruisi Zhang
F. Koushanfar
WaLM
229
3
0
24 Oct 2024
AdaEDL: Early Draft Stopping for Speculative Decoding of Large Language Models via an Entropy-based Lower Bound on Token Acceptance Probability
Sudhanshu Agrawal
Wonseok Jeon
Mingu Lee
129
10
0
24 Oct 2024
big.LITTLE Vision Transformer for Efficient Visual Recognition
He Guo
Yulong Wang
Zixuan Ye
Jifeng Dai
Yuwen Xiong
ViT
199
1
0
14 Oct 2024
SWIFT: On-the-Fly Self-Speculative Decoding for LLM Inference Acceleration
International Conference on Learning Representations (ICLR), 2024
Heming Xia
Yongqi Li
Jun Zhang
Cunxiao Du
Wenjie Li
LRM
305
36
0
09 Oct 2024
A Survey: Collaborative Hardware and Software Design in the Era of Large Language Models
IEEE Circuits and Systems Magazine (IEEE CSM), 2024
Cong Guo
Feng Cheng
Zhixu Du
James Kiessling
Jonathan Ku
...
Qilin Zheng
Guanglei Zhou
Hai
Li-Wei Li
Yiran Chen
169
17
0
08 Oct 2024
ESPACE: Dimensionality Reduction of Activations for Model Compression
Neural Information Processing Systems (NeurIPS), 2024
Charbel Sakr
Brucek Khailany
190
13
0
07 Oct 2024
Efficient Inference for Large Language Model-based Generative Recommendation
International Conference on Learning Representations (ICLR), 2024
Xinyu Lin
Chaoqun Yang
Wenjie Wang
Yongqi Li
Cunxiao Du
Fuli Feng
See-Kiong Ng
Tat-Seng Chua
294
13
0
07 Oct 2024
Dynamic-Width Speculative Beam Decoding for Efficient LLM Inference
Zongyue Qin
Zifan He
Neha Prakriya
Jason Cong
Yizhou Sun
260
7
0
25 Sep 2024
Multi-Programming Language Ensemble for Code Generation in Large Language Model
Tengfei Xue
Xuefeng Li
Tahir Azim
Roman Smirnov
Jianhui Yu
Arash Sadrieh
Babak Pahlavan
196
3
0
06 Sep 2024
Context-Aware Assistant Selection for Improved Inference Acceleration with Large Language Models
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2024
Jerry Huang
Prasanna Parthasarathi
Mehdi Rezagholizadeh
Sarath Chandar
197
5
0
16 Aug 2024
Clover-2: Accurate Inference for Regressive Lightweight Speculative Decoding
Bin Xiao
Lujun Gui
Lei Su
Weipeng Chen
172
5
0
01 Aug 2024
Adaptive Draft-Verification for Efficient Large Language Model Decoding
Xukun Liu
Bowen Lei
Ruqi Zhang
Dongkuan Xu
214
7
0
27 Jun 2024
EAGLE-2: Faster Inference of Language Models with Dynamic Draft Trees
Yuhui Li
Fangyun Wei
Chao Zhang
Hongyang R. Zhang
346
171
0
24 Jun 2024
LiveMind: Low-latency Large Language Models with Simultaneous Inference
Chuangtao Chen
Grace Li Zhang
Xunzhao Yin
Cheng Zhuo
Ulf Schlichtmann
Bing Li
LRM
267
7
0
20 Jun 2024
Slice-Level Scheduling for High Throughput and Load Balanced LLM Serving
Ke Cheng
Wen Hu
Zhi Wang
Hongen Peng
Jianguo Li
Sheng Zhang
151
14
0
19 Jun 2024
Fast and Slow Generating: An Empirical Study on Large and Small Language Models Collaborative Decoding
Kaiyan Zhang
Jianyu Wang
Ning Ding
Biqing Qi
Ermo Hua
Xingtai Lv
Bowen Zhou
287
14
0
18 Jun 2024
Enabling Efficient Batch Serving for LMaaS via Generation Length Prediction
Ke Cheng
Wen Hu
Zhi Wang
Peng Du
Jianguo Li
Sheng Zhang
241
16
0
07 Jun 2024
Fast yet Safe: Early-Exiting with Risk Control
Metod Jazbec
Alexander Timans
Tin Hadvzi Veljković
K. Sakmann
Dan Zhang
C. A. Naesseth
Eric T. Nalisnick
238
12
0
31 May 2024
SpecDec++: Boosting Speculative Decoding via Adaptive Candidate Lengths
Kaixuan Huang
Xudong Guo
M. Y. Wang
433
38
0
30 May 2024
Faster Cascades via Speculative Decoding
Harikrishna Narasimhan
Wittawat Jitkrittum
A. S. Rawat
Seungyeon Kim
Neha Gupta
A. Menon
Sanjiv Kumar
LRM
316
19
0
29 May 2024
Superposed Decoding: Multiple Generations from a Single Autoregressive Inference Pass
Ethan Shen
Alan Fan
Sarah M Pratt
Jae Sung Park
Matthew Wallingford
Sham Kakade
Ari Holtzman
Ranjay Krishna
Ali Farhadi
Aditya Kusupati
302
4
0
28 May 2024
Hardware-Aware Parallel Prompt Decoding for Memory-Efficient Acceleration of LLM Inference
Hao Mark Chen
Wayne Luk
Ka-Fai Cedric Yiu
Rui Li
Konstantin Mishchenko
Stylianos I. Venieris
Hongxiang Fan
213
14
0
28 May 2024
Distributed Speculative Inference (DSI): Speculation Parallelism for Provably Faster Lossless Language Model Inference
International Conference on Learning Representations (ICLR), 2024
Nadav Timor
Jonathan Mamou
Daniel Korat
Moshe Berchansky
Oren Pereg
Moshe Wasserblat
Tomer Galanti
Michal Gordon
David Harel
LRM
200
6
0
23 May 2024
A Comprehensive Survey of Accelerated Generation Techniques in Large Language Models
Mahsa Khoshnoodi
Vinija Jain
Mingye Gao
Malavika Srikanth
Vasu Sharma
OffRL
288
7
0
15 May 2024
Clover: Regressive Lightweight Speculative Decoding with Sequential Knowledge
Bin Xiao
Chunan Shi
Xiaonan Nie
Fan Yang
Xiangwei Deng
Lei Su
Weipeng Chen
Tengjiao Wang
224
10
0
01 May 2024
Hybrid LLM: Cost-Efficient and Quality-Aware Query Routing
Dujian Ding
Ankur Mallick
Chi Wang
Robert Sim
Subhabrata Mukherjee
Victor Rühle
L. Lakshmanan
Ahmed Hassan Awadallah
331
174
0
22 Apr 2024
Mélange: Cost Efficient Large Language Model Serving by Exploiting GPU Heterogeneity
Tyler Griggs
Xiaoxuan Liu
Jiaxiang Yu
Doyoung Kim
Wei-Lin Chiang
Alvin Cheung
Ion Stoica
249
23
0
22 Apr 2024
TriForce: Lossless Acceleration of Long Sequence Generation with Hierarchical Speculative Decoding
Hanshi Sun
Zhuoming Chen
Xinyu Yang
Yuandong Tian
Beidi Chen
301
83
0
18 Apr 2024
Exploring and Improving Drafts in Blockwise Parallel Decoding
Taehyeon Kim
A. Suresh
Kishore Papineni
Michael Riley
Sanjiv Kumar
Adrian Benton
AI4TS
241
4
0
14 Apr 2024
Lossless Acceleration of Large Language Model via Adaptive N-gram Parallel Decoding
North American Chapter of the Association for Computational Linguistics (NAACL), 2024
Jie Ou
Yueming Chen
Wenhong Tian
232
21
0
10 Apr 2024
The Larger the Better? Improved LLM Code-Generation via Budget Reallocation
Michael Hassid
Tal Remez
Jonas Gehring
Roy Schwartz
Yossi Adi
245
40
0
31 Mar 2024
LLM Inference Unveiled: Survey and Roofline Model Insights
Zhihang Yuan
Yuzhang Shang
Yang Zhou
Zhen Dong
Zhe Zhou
...
Yong Jae Lee
Yan Yan
Beidi Chen
Guangyu Sun
Kurt Keutzer
531
143
0
26 Feb 2024
Chimera: A Lossless Decoding Method for Accelerating Large Language Models Inference by Fusing all Tokens
Huiping Zhuang
Jiahong Yu
Qianshi Pang
Zihao Wang
Huiping Zhuang
Cen Chen
Xiaofeng Zou
214
5
0
24 Feb 2024
Generation Meets Verification: Accelerating Large Language Model Inference with Smart Parallel Auto-Correct Decoding
Hanling Yi
Feng-Huei Lin
Hongbin Li
Peiyang Ning
Xiaotian Yu
Rong Xiao
LRM
267
21
0
19 Feb 2024
Any-Precision LLM: Low-Cost Deployment of Multiple, Different-Sized LLMs
Yeonhong Park
Jake Hyun
SangLyul Cho
Bonggeun Sim
Jae W. Lee
MQ
273
38
0
16 Feb 2024
Tandem Transformers for Inference Efficient LLMs
S. AishwaryaP
Pranav Ajit Nair
Yashas Samaga
Toby Boyd
Sanjiv Kumar
Prateek Jain
Praneeth Netrapalli
152
10
0
13 Feb 2024
GliDe with a CaPE: A Low-Hassle Method to Accelerate Speculative Decoding
Cunxiao Du
Jing Jiang
Yuanchen Xu
Jiawei Wu
Sicheng Yu
...
Shenggui Li
Kai Xu
Liqiang Nie
Zhaopeng Tu
Yang You
191
57
0
03 Feb 2024
Break the Sequential Dependency of LLM Inference Using Lookahead Decoding
Yichao Fu
Peter Bailis
Ion Stoica
Hao Zhang
325
231
0
03 Feb 2024
EAGLE: Speculative Sampling Requires Rethinking Feature Uncertainty
International Conference on Machine Learning (ICML), 2024
Yuhui Li
Fangyun Wei
Chao Zhang
Hongyang R. Zhang
462
295
0
26 Jan 2024
Unlocking Efficiency in Large Language Model Inference: A Comprehensive Survey of Speculative Decoding
Annual Meeting of the Association for Computational Linguistics (ACL), 2024
Heming Xia
Zhe Yang
Qingxiu Dong
Peiyi Wang
Chak Tou Leong
Tao Ge
Tianyu Liu
Wenjie Li
Zhifang Sui
LRM
369
196
0
15 Jan 2024
EE-LLM: Large-Scale Training and Inference of Early-Exit Large Language Models with 3D Parallelism
International Conference on Machine Learning (ICML), 2023
Yanxi Chen
Xuchen Pan
Yaliang Li
Bolin Ding
Jingren Zhou
LRM
390
53
0
08 Dec 2023
Efficient Deep Speech Understanding at the Edge
Rongxiang Wang
Felix Lin
146
1
0
22 Nov 2023
Speculative Contrastive Decoding
Annual Meeting of the Association for Computational Linguistics (ACL), 2023
Hongyi Yuan
Keming Lu
Fei Huang
Zheng Yuan
Chang Zhou
139
8
0
15 Nov 2023
Previous
1
2
3
Next