ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2404.18911
  4. Cited By
Kangaroo: Lossless Self-Speculative Decoding via Double Early Exiting

Kangaroo: Lossless Self-Speculative Decoding via Double Early Exiting

29 April 2024
Fangcheng Liu
Yehui Tang
Zhenhua Liu
Yunsheng Ni
Kai Han
Yunhe Wang
ArXivPDFHTML

Papers citing "Kangaroo: Lossless Self-Speculative Decoding via Double Early Exiting"

22 / 22 papers shown
Title
PARD: Accelerating LLM Inference with Low-Cost PARallel Draft Model Adaptation
PARD: Accelerating LLM Inference with Low-Cost PARallel Draft Model Adaptation
Zihao An
Huajun Bai
Z. Liu
Dong Li
E. Barsoum
51
0
0
23 Apr 2025
SD$^2$: Self-Distilled Sparse Drafters
SD2^22: Self-Distilled Sparse Drafters
Mike Lasby
Nish Sinnadurai
Valavan Manohararajah
Sean Lie
Vithursan Thangarasa
46
0
0
10 Apr 2025
DEL: Context-Aware Dynamic Exit Layer for Efficient Self-Speculative Decoding
DEL: Context-Aware Dynamic Exit Layer for Efficient Self-Speculative Decoding
Hossein Entezari Zarch
Lei Gao
Chaoyi Jiang
Murali Annavaram
LRM
23
0
0
08 Apr 2025
Gumiho: A Hybrid Architecture to Prioritize Early Tokens in Speculative Decoding
J. Li
Yixing Xu
Haiduo Huang
Xuanwu Yin
D. Li
Edith C. -H. Ngai
E. Barsoum
45
0
0
13 Mar 2025
EAGLE-3: Scaling up Inference Acceleration of Large Language Models via Training-Time Test
EAGLE-3: Scaling up Inference Acceleration of Large Language Models via Training-Time Test
Yuhui Li
Fangyun Wei
Chao Zhang
Hongyang R. Zhang
106
3
0
03 Mar 2025
Speculative Decoding and Beyond: An In-Depth Survey of Techniques
Speculative Decoding and Beyond: An In-Depth Survey of Techniques
Y. Hu
Zining Liu
Zhenyuan Dong
Tianfan Peng
Bradley McDanel
S. Zhang
85
0
0
27 Feb 2025
The Efficiency vs. Accuracy Trade-off: Optimizing RAG-Enhanced LLM Recommender Systems Using Multi-Head Early Exit
Huixue Zhou
Hengrui Gu
Xi Liu
Kaixiong Zhou
Mingfu Liang
...
Wen-Yen Chen
Yiping Han
Bo Long
Rui Zhang
Tianlong Chen
3DV
31
1
0
04 Jan 2025
Relaxed Recursive Transformers: Effective Parameter Sharing with Layer-wise LoRA
Relaxed Recursive Transformers: Effective Parameter Sharing with Layer-wise LoRA
Sangmin Bae
Adam Fisch
Hrayr Harutyunyan
Ziwei Ji
Seungyeon Kim
Tal Schuster
KELM
54
5
0
28 Oct 2024
Large Language Model Inference Acceleration: A Comprehensive Hardware Perspective
Large Language Model Inference Acceleration: A Comprehensive Hardware Perspective
Jinhao Li
Jiaming Xu
Shan Huang
Yonghua Chen
Wen Li
...
Jiayi Pan
Li Ding
Hao Zhou
Yu Wang
Guohao Dai
41
13
0
06 Oct 2024
Draft on the Fly: Adaptive Self-Speculative Decoding using Cosine
  Similarity
Draft on the Fly: Adaptive Self-Speculative Decoding using Cosine Similarity
Michael R. Metel
Peng Lu
Boxing Chen
Mehdi Rezagholizadeh
I. Kobyzev
19
3
0
01 Oct 2024
Interactive Speculative Planning: Enhance Agent Efficiency through
  Co-design of System and User Interface
Interactive Speculative Planning: Enhance Agent Efficiency through Co-design of System and User Interface
Wenyue Hua
Mengting Wan
Shashank Vadrevu
Ryan Nadel
Yongfeng Zhang
Chi Wang
LLMAG
19
1
0
30 Sep 2024
Whisper in Medusa's Ear: Multi-head Efficient Decoding for
  Transformer-based ASR
Whisper in Medusa's Ear: Multi-head Efficient Decoding for Transformer-based ASR
Yael Segal-Feldman
Aviv Shamsian
Aviv Navon
Gill Hetz
Joseph Keshet
20
1
0
24 Sep 2024
Turning Trash into Treasure: Accelerating Inference of Large Language
  Models with Token Recycling
Turning Trash into Treasure: Accelerating Inference of Large Language Models with Token Recycling
Xianzhen Luo
Yixuan Wang
Qingfu Zhu
Zhiming Zhang
Xuanyu Zhang
Qing Yang
Dongliang Xu
Wanxiang Che
18
3
0
16 Aug 2024
Merge, Ensemble, and Cooperate! A Survey on Collaborative Strategies in
  the Era of Large Language Models
Merge, Ensemble, and Cooperate! A Survey on Collaborative Strategies in the Era of Large Language Models
Jinliang Lu
Ziliang Pang
Min Xiao
Yaochen Zhu
Rui Xia
Jiajun Zhang
MoMe
16
17
0
08 Jul 2024
EAGLE-2: Faster Inference of Language Models with Dynamic Draft Trees
EAGLE-2: Faster Inference of Language Models with Dynamic Draft Trees
Yuhui Li
Fangyun Wei
Chao Zhang
Hongyang R. Zhang
75
1
0
24 Jun 2024
S3D: A Simple and Cost-Effective Self-Speculative Decoding Scheme for
  Low-Memory GPUs
S3D: A Simple and Cost-Effective Self-Speculative Decoding Scheme for Low-Memory GPUs
Wei Zhong
Manasa Bharadwaj
28
5
0
30 May 2024
SpecDec++: Boosting Speculative Decoding via Adaptive Candidate Lengths
SpecDec++: Boosting Speculative Decoding via Adaptive Candidate Lengths
Kaixuan Huang
Xudong Guo
Mengdi Wang
21
16
0
30 May 2024
Faster Cascades via Speculative Decoding
Faster Cascades via Speculative Decoding
Harikrishna Narasimhan
Wittawat Jitkrittum
A. S. Rawat
Seungyeon Kim
Neha Gupta
A. Menon
Sanjiv Kumar
LRM
33
6
0
29 May 2024
EMS-SD: Efficient Multi-sample Speculative Decoding for Accelerating
  Large Language Models
EMS-SD: Efficient Multi-sample Speculative Decoding for Accelerating Large Language Models
Yunsheng Ni
Chuanjian Liu
Yehui Tang
Kai Han
Yunhe Wang
15
0
0
13 May 2024
A Survey on Efficient Inference for Large Language Models
A Survey on Efficient Inference for Large Language Models
Zixuan Zhou
Xuefei Ning
Ke Hong
Tianyu Fu
Jiaming Xu
...
Shengen Yan
Guohao Dai
Xiao-Ping Zhang
Yuhan Dong
Yu-Xiang Wang
40
78
0
22 Apr 2024
Break the Sequential Dependency of LLM Inference Using Lookahead
  Decoding
Break the Sequential Dependency of LLM Inference Using Lookahead Decoding
Yichao Fu
Peter Bailis
Ion Stoica
Hao Zhang
120
134
0
03 Feb 2024
PanGu-$π$: Enhancing Language Model Architectures via Nonlinearity
  Compensation
PanGu-πππ: Enhancing Language Model Architectures via Nonlinearity Compensation
Yunhe Wang
Hanting Chen
Yehui Tang
Tianyu Guo
Kai Han
...
Qinghua Xu
Qun Liu
Jun Yao
Chao Xu
Dacheng Tao
51
15
0
27 Dec 2023
1