ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2302.07863
  4. Cited By
Speculative Decoding with Big Little Decoder
v1v2v3v4 (latest)

Speculative Decoding with Big Little Decoder

Neural Information Processing Systems (NeurIPS), 2023
15 February 2023
Sehoon Kim
K. Mangalam
Suhong Moon
Jitendra Malik
Michael W. Mahoney
A. Gholami
Kurt Keutzer
    MoE
ArXiv (abs)PDFHTML

Papers citing "Speculative Decoding with Big Little Decoder"

50 / 103 papers shown
Title
QuantSpec: Self-Speculative Decoding with Hierarchical Quantized KV Cache
QuantSpec: Self-Speculative Decoding with Hierarchical Quantized KV Cache
Rishabh Tiwari
Haocheng Xi
Aditya Tomar
Coleman Hooper
Sehoon Kim
Maxwell Horton
Mahyar Najibi
Michael W. Mahoney
Kemal Kurniawan
Amir Gholami
MQ
193
9
0
05 Feb 2025
Judge Decoding: Faster Speculative Sampling Requires Going Beyond Model Alignment
Judge Decoding: Faster Speculative Sampling Requires Going Beyond Model AlignmentInternational Conference on Learning Representations (ICLR), 2025
Gregor Bachmann
Sotiris Anagnostidis
Albert Pumarola
Markos Georgopoulos
A. Sanakoyeu
Yuming Du
Edgar Schönfeld
Ali K. Thabet
Jonas Kohler
ALMBDL
350
28
0
31 Jan 2025
Falcon: Faster and Parallel Inference of Large Language Models through Enhanced Semi-Autoregressive Drafting and Custom-Designed Decoding Tree
Falcon: Faster and Parallel Inference of Large Language Models through Enhanced Semi-Autoregressive Drafting and Custom-Designed Decoding TreeAAAI Conference on Artificial Intelligence (AAAI), 2024
Xiangxiang Gao
Weisheng Xie
Yiwei Xiang
Feng Ji
450
14
0
17 Dec 2024
Constrained Decoding with Speculative Lookaheads
Constrained Decoding with Speculative LookaheadsNorth American Chapter of the Association for Computational Linguistics (NAACL), 2024
Nishanth Nakshatri
Shamik Roy
Rajarshi Das
Suthee Chaidaroon
Leonid Boytsov
Rashmi Gangadharaiah
410
3
0
09 Dec 2024
Software Performance Engineering for Foundation Model-Powered Software
  (FMware)
Software Performance Engineering for Foundation Model-Powered Software (FMware)
Haoxiang Zhang
Shi Chang
Arthur Leung
Kishanthan Thangarajah
Boyuan Chen
Hanan Lutfiyya
Ahmed E. Hassan
552
2
0
14 Nov 2024
When Speculation Spills Secrets: Side Channels via Speculative Decoding In LLMs
When Speculation Spills Secrets: Side Channels via Speculative Decoding In LLMs
Jiankun Wei
Abdulrahman Abdulrazzag
Tianchen Zhang
Adel Muursepp
Gururaj Saileshwar
345
4
0
01 Nov 2024
A Theoretical Perspective for Speculative Decoding Algorithm
A Theoretical Perspective for Speculative Decoding AlgorithmNeural Information Processing Systems (NeurIPS), 2024
Ming Yin
Minshuo Chen
Kaixuan Huang
Mengdi Wang
180
20
0
30 Oct 2024
Watermarking Large Language Models and the Generated Content:
  Opportunities and Challenges
Watermarking Large Language Models and the Generated Content: Opportunities and ChallengesAsilomar Conference on Signals, Systems and Computers (ACSSC), 2024
Ruisi Zhang
F. Koushanfar
WaLM
229
3
0
24 Oct 2024
AdaEDL: Early Draft Stopping for Speculative Decoding of Large Language
  Models via an Entropy-based Lower Bound on Token Acceptance Probability
AdaEDL: Early Draft Stopping for Speculative Decoding of Large Language Models via an Entropy-based Lower Bound on Token Acceptance Probability
Sudhanshu Agrawal
Wonseok Jeon
Mingu Lee
129
10
0
24 Oct 2024
big.LITTLE Vision Transformer for Efficient Visual Recognition
big.LITTLE Vision Transformer for Efficient Visual Recognition
He Guo
Yulong Wang
Zixuan Ye
Jifeng Dai
Yuwen Xiong
ViT
199
1
0
14 Oct 2024
SWIFT: On-the-Fly Self-Speculative Decoding for LLM Inference Acceleration
SWIFT: On-the-Fly Self-Speculative Decoding for LLM Inference AccelerationInternational Conference on Learning Representations (ICLR), 2024
Heming Xia
Yongqi Li
Jun Zhang
Cunxiao Du
Wenjie Li
LRM
305
36
0
09 Oct 2024
A Survey: Collaborative Hardware and Software Design in the Era of Large
  Language Models
A Survey: Collaborative Hardware and Software Design in the Era of Large Language ModelsIEEE Circuits and Systems Magazine (IEEE CSM), 2024
Cong Guo
Feng Cheng
Zhixu Du
James Kiessling
Jonathan Ku
...
Qilin Zheng
Guanglei Zhou
Hai
Li-Wei Li
Yiran Chen
169
17
0
08 Oct 2024
ESPACE: Dimensionality Reduction of Activations for Model Compression
ESPACE: Dimensionality Reduction of Activations for Model CompressionNeural Information Processing Systems (NeurIPS), 2024
Charbel Sakr
Brucek Khailany
190
13
0
07 Oct 2024
Efficient Inference for Large Language Model-based Generative Recommendation
Efficient Inference for Large Language Model-based Generative RecommendationInternational Conference on Learning Representations (ICLR), 2024
Xinyu Lin
Chaoqun Yang
Wenjie Wang
Yongqi Li
Cunxiao Du
Fuli Feng
See-Kiong Ng
Tat-Seng Chua
294
13
0
07 Oct 2024
Dynamic-Width Speculative Beam Decoding for Efficient LLM Inference
Dynamic-Width Speculative Beam Decoding for Efficient LLM Inference
Zongyue Qin
Zifan He
Neha Prakriya
Jason Cong
Yizhou Sun
260
7
0
25 Sep 2024
Multi-Programming Language Ensemble for Code Generation in Large
  Language Model
Multi-Programming Language Ensemble for Code Generation in Large Language Model
Tengfei Xue
Xuefeng Li
Tahir Azim
Roman Smirnov
Jianhui Yu
Arash Sadrieh
Babak Pahlavan
196
3
0
06 Sep 2024
Context-Aware Assistant Selection for Improved Inference Acceleration
  with Large Language Models
Context-Aware Assistant Selection for Improved Inference Acceleration with Large Language ModelsConference on Empirical Methods in Natural Language Processing (EMNLP), 2024
Jerry Huang
Prasanna Parthasarathi
Mehdi Rezagholizadeh
Sarath Chandar
197
5
0
16 Aug 2024
Clover-2: Accurate Inference for Regressive Lightweight Speculative
  Decoding
Clover-2: Accurate Inference for Regressive Lightweight Speculative Decoding
Bin Xiao
Lujun Gui
Lei Su
Weipeng Chen
172
5
0
01 Aug 2024
Adaptive Draft-Verification for Efficient Large Language Model Decoding
Adaptive Draft-Verification for Efficient Large Language Model Decoding
Xukun Liu
Bowen Lei
Ruqi Zhang
Dongkuan Xu
214
7
0
27 Jun 2024
EAGLE-2: Faster Inference of Language Models with Dynamic Draft Trees
EAGLE-2: Faster Inference of Language Models with Dynamic Draft Trees
Yuhui Li
Fangyun Wei
Chao Zhang
Hongyang R. Zhang
346
171
0
24 Jun 2024
LiveMind: Low-latency Large Language Models with Simultaneous Inference
LiveMind: Low-latency Large Language Models with Simultaneous Inference
Chuangtao Chen
Grace Li Zhang
Xunzhao Yin
Cheng Zhuo
Ulf Schlichtmann
Bing Li
LRM
267
7
0
20 Jun 2024
Slice-Level Scheduling for High Throughput and Load Balanced LLM Serving
Slice-Level Scheduling for High Throughput and Load Balanced LLM Serving
Ke Cheng
Wen Hu
Zhi Wang
Hongen Peng
Jianguo Li
Sheng Zhang
151
14
0
19 Jun 2024
Fast and Slow Generating: An Empirical Study on Large and Small Language
  Models Collaborative Decoding
Fast and Slow Generating: An Empirical Study on Large and Small Language Models Collaborative Decoding
Kaiyan Zhang
Jianyu Wang
Ning Ding
Biqing Qi
Ermo Hua
Xingtai Lv
Bowen Zhou
287
14
0
18 Jun 2024
Enabling Efficient Batch Serving for LMaaS via Generation Length
  Prediction
Enabling Efficient Batch Serving for LMaaS via Generation Length Prediction
Ke Cheng
Wen Hu
Zhi Wang
Peng Du
Jianguo Li
Sheng Zhang
241
16
0
07 Jun 2024
Fast yet Safe: Early-Exiting with Risk Control
Fast yet Safe: Early-Exiting with Risk Control
Metod Jazbec
Alexander Timans
Tin Hadvzi Veljković
K. Sakmann
Dan Zhang
C. A. Naesseth
Eric T. Nalisnick
238
12
0
31 May 2024
SpecDec++: Boosting Speculative Decoding via Adaptive Candidate Lengths
SpecDec++: Boosting Speculative Decoding via Adaptive Candidate Lengths
Kaixuan Huang
Xudong Guo
M. Y. Wang
433
38
0
30 May 2024
Faster Cascades via Speculative Decoding
Faster Cascades via Speculative Decoding
Harikrishna Narasimhan
Wittawat Jitkrittum
A. S. Rawat
Seungyeon Kim
Neha Gupta
A. Menon
Sanjiv Kumar
LRM
316
19
0
29 May 2024
Superposed Decoding: Multiple Generations from a Single Autoregressive
  Inference Pass
Superposed Decoding: Multiple Generations from a Single Autoregressive Inference Pass
Ethan Shen
Alan Fan
Sarah M Pratt
Jae Sung Park
Matthew Wallingford
Sham Kakade
Ari Holtzman
Ranjay Krishna
Ali Farhadi
Aditya Kusupati
302
4
0
28 May 2024
Hardware-Aware Parallel Prompt Decoding for Memory-Efficient Acceleration of LLM Inference
Hardware-Aware Parallel Prompt Decoding for Memory-Efficient Acceleration of LLM Inference
Hao Mark Chen
Wayne Luk
Ka-Fai Cedric Yiu
Rui Li
Konstantin Mishchenko
Stylianos I. Venieris
Hongxiang Fan
213
14
0
28 May 2024
Distributed Speculative Inference (DSI): Speculation Parallelism for Provably Faster Lossless Language Model Inference
Distributed Speculative Inference (DSI): Speculation Parallelism for Provably Faster Lossless Language Model InferenceInternational Conference on Learning Representations (ICLR), 2024
Nadav Timor
Jonathan Mamou
Daniel Korat
Moshe Berchansky
Oren Pereg
Moshe Wasserblat
Tomer Galanti
Michal Gordon
David Harel
LRM
200
6
0
23 May 2024
A Comprehensive Survey of Accelerated Generation Techniques in Large
  Language Models
A Comprehensive Survey of Accelerated Generation Techniques in Large Language Models
Mahsa Khoshnoodi
Vinija Jain
Mingye Gao
Malavika Srikanth
Vasu Sharma
OffRL
288
7
0
15 May 2024
Clover: Regressive Lightweight Speculative Decoding with Sequential
  Knowledge
Clover: Regressive Lightweight Speculative Decoding with Sequential Knowledge
Bin Xiao
Chunan Shi
Xiaonan Nie
Fan Yang
Xiangwei Deng
Lei Su
Weipeng Chen
Tengjiao Wang
224
10
0
01 May 2024
Hybrid LLM: Cost-Efficient and Quality-Aware Query Routing
Hybrid LLM: Cost-Efficient and Quality-Aware Query Routing
Dujian Ding
Ankur Mallick
Chi Wang
Robert Sim
Subhabrata Mukherjee
Victor Rühle
L. Lakshmanan
Ahmed Hassan Awadallah
331
174
0
22 Apr 2024
Mélange: Cost Efficient Large Language Model Serving by Exploiting GPU
  Heterogeneity
Mélange: Cost Efficient Large Language Model Serving by Exploiting GPU Heterogeneity
Tyler Griggs
Xiaoxuan Liu
Jiaxiang Yu
Doyoung Kim
Wei-Lin Chiang
Alvin Cheung
Ion Stoica
249
23
0
22 Apr 2024
TriForce: Lossless Acceleration of Long Sequence Generation with
  Hierarchical Speculative Decoding
TriForce: Lossless Acceleration of Long Sequence Generation with Hierarchical Speculative Decoding
Hanshi Sun
Zhuoming Chen
Xinyu Yang
Yuandong Tian
Beidi Chen
301
83
0
18 Apr 2024
Exploring and Improving Drafts in Blockwise Parallel Decoding
Exploring and Improving Drafts in Blockwise Parallel Decoding
Taehyeon Kim
A. Suresh
Kishore Papineni
Michael Riley
Sanjiv Kumar
Adrian Benton
AI4TS
241
4
0
14 Apr 2024
Lossless Acceleration of Large Language Model via Adaptive N-gram
  Parallel Decoding
Lossless Acceleration of Large Language Model via Adaptive N-gram Parallel DecodingNorth American Chapter of the Association for Computational Linguistics (NAACL), 2024
Jie Ou
Yueming Chen
Wenhong Tian
232
21
0
10 Apr 2024
The Larger the Better? Improved LLM Code-Generation via Budget
  Reallocation
The Larger the Better? Improved LLM Code-Generation via Budget Reallocation
Michael Hassid
Tal Remez
Jonas Gehring
Roy Schwartz
Yossi Adi
245
40
0
31 Mar 2024
LLM Inference Unveiled: Survey and Roofline Model Insights
LLM Inference Unveiled: Survey and Roofline Model Insights
Zhihang Yuan
Yuzhang Shang
Yang Zhou
Zhen Dong
Zhe Zhou
...
Yong Jae Lee
Yan Yan
Beidi Chen
Guangyu Sun
Kurt Keutzer
531
143
0
26 Feb 2024
Chimera: A Lossless Decoding Method for Accelerating Large Language
  Models Inference by Fusing all Tokens
Chimera: A Lossless Decoding Method for Accelerating Large Language Models Inference by Fusing all Tokens
Huiping Zhuang
Jiahong Yu
Qianshi Pang
Zihao Wang
Huiping Zhuang
Cen Chen
Xiaofeng Zou
214
5
0
24 Feb 2024
Generation Meets Verification: Accelerating Large Language Model
  Inference with Smart Parallel Auto-Correct Decoding
Generation Meets Verification: Accelerating Large Language Model Inference with Smart Parallel Auto-Correct Decoding
Hanling Yi
Feng-Huei Lin
Hongbin Li
Peiyang Ning
Xiaotian Yu
Rong Xiao
LRM
267
21
0
19 Feb 2024
Any-Precision LLM: Low-Cost Deployment of Multiple, Different-Sized LLMs
Any-Precision LLM: Low-Cost Deployment of Multiple, Different-Sized LLMs
Yeonhong Park
Jake Hyun
SangLyul Cho
Bonggeun Sim
Jae W. Lee
MQ
273
38
0
16 Feb 2024
Tandem Transformers for Inference Efficient LLMs
Tandem Transformers for Inference Efficient LLMs
S. AishwaryaP
Pranav Ajit Nair
Yashas Samaga
Toby Boyd
Sanjiv Kumar
Prateek Jain
Praneeth Netrapalli
152
10
0
13 Feb 2024
GliDe with a CaPE: A Low-Hassle Method to Accelerate Speculative
  Decoding
GliDe with a CaPE: A Low-Hassle Method to Accelerate Speculative Decoding
Cunxiao Du
Jing Jiang
Yuanchen Xu
Jiawei Wu
Sicheng Yu
...
Shenggui Li
Kai Xu
Liqiang Nie
Zhaopeng Tu
Yang You
191
57
0
03 Feb 2024
Break the Sequential Dependency of LLM Inference Using Lookahead
  Decoding
Break the Sequential Dependency of LLM Inference Using Lookahead Decoding
Yichao Fu
Peter Bailis
Ion Stoica
Hao Zhang
325
231
0
03 Feb 2024
EAGLE: Speculative Sampling Requires Rethinking Feature Uncertainty
EAGLE: Speculative Sampling Requires Rethinking Feature UncertaintyInternational Conference on Machine Learning (ICML), 2024
Yuhui Li
Fangyun Wei
Chao Zhang
Hongyang R. Zhang
462
295
0
26 Jan 2024
Unlocking Efficiency in Large Language Model Inference: A Comprehensive
  Survey of Speculative Decoding
Unlocking Efficiency in Large Language Model Inference: A Comprehensive Survey of Speculative DecodingAnnual Meeting of the Association for Computational Linguistics (ACL), 2024
Heming Xia
Zhe Yang
Qingxiu Dong
Peiyi Wang
Chak Tou Leong
Tao Ge
Tianyu Liu
Wenjie Li
Zhifang Sui
LRM
369
196
0
15 Jan 2024
EE-LLM: Large-Scale Training and Inference of Early-Exit Large Language
  Models with 3D Parallelism
EE-LLM: Large-Scale Training and Inference of Early-Exit Large Language Models with 3D ParallelismInternational Conference on Machine Learning (ICML), 2023
Yanxi Chen
Xuchen Pan
Yaliang Li
Bolin Ding
Jingren Zhou
LRM
390
53
0
08 Dec 2023
Efficient Deep Speech Understanding at the Edge
Efficient Deep Speech Understanding at the Edge
Rongxiang Wang
Felix Lin
146
1
0
22 Nov 2023
Speculative Contrastive Decoding
Speculative Contrastive DecodingAnnual Meeting of the Association for Computational Linguistics (ACL), 2023
Hongyi Yuan
Keming Lu
Fei Huang
Zheng Yuan
Chang Zhou
139
8
0
15 Nov 2023
Previous
123
Next