Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2404.18911
Cited By
Kangaroo: Lossless Self-Speculative Decoding via Double Early Exiting
29 April 2024
Fangcheng Liu
Yehui Tang
Zhenhua Liu
Yunsheng Ni
Kai Han
Yunhe Wang
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Kangaroo: Lossless Self-Speculative Decoding via Double Early Exiting"
22 / 22 papers shown
Title
PARD: Accelerating LLM Inference with Low-Cost PARallel Draft Model Adaptation
Zihao An
Huajun Bai
Z. Liu
Dong Li
E. Barsoum
51
0
0
23 Apr 2025
SD
2
^2
2
: Self-Distilled Sparse Drafters
Mike Lasby
Nish Sinnadurai
Valavan Manohararajah
Sean Lie
Vithursan Thangarasa
46
0
0
10 Apr 2025
DEL: Context-Aware Dynamic Exit Layer for Efficient Self-Speculative Decoding
Hossein Entezari Zarch
Lei Gao
Chaoyi Jiang
Murali Annavaram
LRM
23
0
0
08 Apr 2025
Gumiho: A Hybrid Architecture to Prioritize Early Tokens in Speculative Decoding
J. Li
Yixing Xu
Haiduo Huang
Xuanwu Yin
D. Li
Edith C. -H. Ngai
E. Barsoum
45
0
0
13 Mar 2025
EAGLE-3: Scaling up Inference Acceleration of Large Language Models via Training-Time Test
Yuhui Li
Fangyun Wei
Chao Zhang
Hongyang R. Zhang
106
3
0
03 Mar 2025
Speculative Decoding and Beyond: An In-Depth Survey of Techniques
Y. Hu
Zining Liu
Zhenyuan Dong
Tianfan Peng
Bradley McDanel
S. Zhang
85
0
0
27 Feb 2025
The Efficiency vs. Accuracy Trade-off: Optimizing RAG-Enhanced LLM Recommender Systems Using Multi-Head Early Exit
Huixue Zhou
Hengrui Gu
Xi Liu
Kaixiong Zhou
Mingfu Liang
...
Wen-Yen Chen
Yiping Han
Bo Long
Rui Zhang
Tianlong Chen
3DV
31
1
0
04 Jan 2025
Relaxed Recursive Transformers: Effective Parameter Sharing with Layer-wise LoRA
Sangmin Bae
Adam Fisch
Hrayr Harutyunyan
Ziwei Ji
Seungyeon Kim
Tal Schuster
KELM
54
5
0
28 Oct 2024
Large Language Model Inference Acceleration: A Comprehensive Hardware Perspective
Jinhao Li
Jiaming Xu
Shan Huang
Yonghua Chen
Wen Li
...
Jiayi Pan
Li Ding
Hao Zhou
Yu Wang
Guohao Dai
41
13
0
06 Oct 2024
Draft on the Fly: Adaptive Self-Speculative Decoding using Cosine Similarity
Michael R. Metel
Peng Lu
Boxing Chen
Mehdi Rezagholizadeh
I. Kobyzev
19
3
0
01 Oct 2024
Interactive Speculative Planning: Enhance Agent Efficiency through Co-design of System and User Interface
Wenyue Hua
Mengting Wan
Shashank Vadrevu
Ryan Nadel
Yongfeng Zhang
Chi Wang
LLMAG
19
1
0
30 Sep 2024
Whisper in Medusa's Ear: Multi-head Efficient Decoding for Transformer-based ASR
Yael Segal-Feldman
Aviv Shamsian
Aviv Navon
Gill Hetz
Joseph Keshet
20
1
0
24 Sep 2024
Turning Trash into Treasure: Accelerating Inference of Large Language Models with Token Recycling
Xianzhen Luo
Yixuan Wang
Qingfu Zhu
Zhiming Zhang
Xuanyu Zhang
Qing Yang
Dongliang Xu
Wanxiang Che
18
3
0
16 Aug 2024
Merge, Ensemble, and Cooperate! A Survey on Collaborative Strategies in the Era of Large Language Models
Jinliang Lu
Ziliang Pang
Min Xiao
Yaochen Zhu
Rui Xia
Jiajun Zhang
MoMe
16
17
0
08 Jul 2024
EAGLE-2: Faster Inference of Language Models with Dynamic Draft Trees
Yuhui Li
Fangyun Wei
Chao Zhang
Hongyang R. Zhang
75
1
0
24 Jun 2024
S3D: A Simple and Cost-Effective Self-Speculative Decoding Scheme for Low-Memory GPUs
Wei Zhong
Manasa Bharadwaj
28
5
0
30 May 2024
SpecDec++: Boosting Speculative Decoding via Adaptive Candidate Lengths
Kaixuan Huang
Xudong Guo
Mengdi Wang
21
16
0
30 May 2024
Faster Cascades via Speculative Decoding
Harikrishna Narasimhan
Wittawat Jitkrittum
A. S. Rawat
Seungyeon Kim
Neha Gupta
A. Menon
Sanjiv Kumar
LRM
33
6
0
29 May 2024
EMS-SD: Efficient Multi-sample Speculative Decoding for Accelerating Large Language Models
Yunsheng Ni
Chuanjian Liu
Yehui Tang
Kai Han
Yunhe Wang
15
0
0
13 May 2024
A Survey on Efficient Inference for Large Language Models
Zixuan Zhou
Xuefei Ning
Ke Hong
Tianyu Fu
Jiaming Xu
...
Shengen Yan
Guohao Dai
Xiao-Ping Zhang
Yuhan Dong
Yu-Xiang Wang
40
78
0
22 Apr 2024
Break the Sequential Dependency of LLM Inference Using Lookahead Decoding
Yichao Fu
Peter Bailis
Ion Stoica
Hao Zhang
120
134
0
03 Feb 2024
PanGu-
π
π
π
: Enhancing Language Model Architectures via Nonlinearity Compensation
Yunhe Wang
Hanting Chen
Yehui Tang
Tianyu Guo
Kai Han
...
Qinghua Xu
Qun Liu
Jun Yao
Chao Xu
Dacheng Tao
51
15
0
27 Dec 2023
1