ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2310.05424
  4. Cited By
Fast and Robust Early-Exiting Framework for Autoregressive Language
  Models with Synchronized Parallel Decoding

Fast and Robust Early-Exiting Framework for Autoregressive Language Models with Synchronized Parallel Decoding

9 October 2023
Sangmin Bae
Jongwoo Ko
Hwanjun Song
SeYoung Yun
ArXivPDFHTML

Papers citing "Fast and Robust Early-Exiting Framework for Autoregressive Language Models with Synchronized Parallel Decoding"

43 / 43 papers shown
Title
HELIOS: Adaptive Model And Early-Exit Selection for Efficient LLM Inference Serving
HELIOS: Adaptive Model And Early-Exit Selection for Efficient LLM Inference Serving
Avinash Kumar
Shashank Nag
Jason Clemons
L. John
Poulami Das
26
0
0
14 Apr 2025
Speculative Decoding and Beyond: An In-Depth Survey of Techniques
Speculative Decoding and Beyond: An In-Depth Survey of Techniques
Y. Hu
Zining Liu
Zhenyuan Dong
Tianfan Peng
Bradley McDanel
S. Zhang
85
0
0
27 Feb 2025
AdaSkip: Adaptive Sublayer Skipping for Accelerating Long-Context LLM Inference
AdaSkip: Adaptive Sublayer Skipping for Accelerating Long-Context LLM Inference
Zhuomin He
Yizhen Yao
Pengfei Zuo
Bin Gao
Qinya Li
Zhenzhe Zheng
Fan Wu
43
0
0
04 Jan 2025
The Efficiency vs. Accuracy Trade-off: Optimizing RAG-Enhanced LLM Recommender Systems Using Multi-Head Early Exit
Huixue Zhou
Hengrui Gu
Xi Liu
Kaixiong Zhou
Mingfu Liang
...
Wen-Yen Chen
Yiping Han
Bo Long
Rui Zhang
Tianlong Chen
3DV
41
1
0
04 Jan 2025
PrisonBreak: Jailbreaking Large Language Models with Fewer Than
  Twenty-Five Targeted Bit-flips
PrisonBreak: Jailbreaking Large Language Models with Fewer Than Twenty-Five Targeted Bit-flips
Zachary Coalson
Jeonghyun Woo
Shiyang Chen
Yu Sun
Lishan Yang
Prashant J. Nair
Bo Fang
Sanghyun Hong
AAML
71
2
0
10 Dec 2024
CE-CoLLM: Efficient and Adaptive Large Language Models Through
  Cloud-Edge Collaboration
CE-CoLLM: Efficient and Adaptive Large Language Models Through Cloud-Edge Collaboration
Hongpeng Jin
Yanzhao Wu
39
4
0
05 Nov 2024
A Theoretical Perspective for Speculative Decoding Algorithm
A Theoretical Perspective for Speculative Decoding Algorithm
Ming Yin
Minshuo Chen
Kaixuan Huang
Mengdi Wang
32
0
0
30 Oct 2024
Relaxed Recursive Transformers: Effective Parameter Sharing with Layer-wise LoRA
Relaxed Recursive Transformers: Effective Parameter Sharing with Layer-wise LoRA
Sangmin Bae
Adam Fisch
Hrayr Harutyunyan
Ziwei Ji
Seungyeon Kim
Tal Schuster
KELM
68
5
0
28 Oct 2024
Dynamic layer selection in decoder-only transformers
Dynamic layer selection in decoder-only transformers
Theodore Glavas
Joud Chataoui
Florence Regol
Wassim Jabbour
Antonios Valkanas
Boris N. Oreshkin
Mark J. Coates
AI4CE
24
0
0
26 Oct 2024
Dynamic Vocabulary Pruning in Early-Exit LLMs
Dynamic Vocabulary Pruning in Early-Exit LLMs
Jort Vincenti
Karim Abdel Sadek
Joan Velja
Matteo Nulli
Metod Jazbec
19
0
0
24 Oct 2024
Router-Tuning: A Simple and Effective Approach for Enabling Dynamic-Depth in Transformers
Router-Tuning: A Simple and Effective Approach for Enabling Dynamic-Depth in Transformers
Shwai He
Tao Ge
Guoheng Sun
Bowei Tian
Xiaoyang Wang
Ang Li
MoE
46
1
0
17 Oct 2024
SWIFT: On-the-Fly Self-Speculative Decoding for LLM Inference Acceleration
SWIFT: On-the-Fly Self-Speculative Decoding for LLM Inference Acceleration
Heming Xia
Yongqi Li
Jun Zhang
Cunxiao Du
Wenjie Li
LRM
46
4
0
09 Oct 2024
A-VL: Adaptive Attention for Large Vision-Language Models
A-VL: Adaptive Attention for Large Vision-Language Models
Junyang Zhang
Mu Yuan
Ruiguang Zhong
Puhan Luo
Huiyou Zhan
Ningkang Zhang
Chengchen Hu
Xiangyang Li
VLM
41
1
0
23 Sep 2024
Parallel AutoRegressive Models for Multi-Agent Combinatorial Optimization
Parallel AutoRegressive Models for Multi-Agent Combinatorial Optimization
Federico Berto
Chuanbo Hua
Laurin Luttmann
Jiwoo Son
Junyoung Park
Kyuree Ahn
Changhyun Kwon
Lin Xie
Jinkyoo Park
33
1
0
05 Sep 2024
Towards Fast Multilingual LLM Inference: Speculative Decoding and
  Specialized Drafters
Towards Fast Multilingual LLM Inference: Speculative Decoding and Specialized Drafters
Euiin Yi
Taehyeon Kim
Hongseok Jeung
Du-Seong Chang
Se-Young Yun
43
4
0
24 Jun 2024
When Linear Attention Meets Autoregressive Decoding: Towards More
  Effective and Efficient Linearized Large Language Models
When Linear Attention Meets Autoregressive Decoding: Towards More Effective and Efficient Linearized Large Language Models
Haoran You
Yichao Fu
Zheng Wang
Amir Yazdanbakhsh
Yingyan Celine Lin
31
1
0
11 Jun 2024
Speculative Decoding via Early-exiting for Faster LLM Inference with
  Thompson Sampling Control Mechanism
Speculative Decoding via Early-exiting for Faster LLM Inference with Thompson Sampling Control Mechanism
Jiahao Liu
Qifan Wang
Jingang Wang
Xunliang Cai
30
6
0
06 Jun 2024
Block Transformer: Global-to-Local Language Modeling for Fast Inference
Block Transformer: Global-to-Local Language Modeling for Fast Inference
Namgyu Ho
Sangmin Bae
Taehyeon Kim
Hyunjik Jo
Yireun Kim
Tal Schuster
Adam Fisch
James Thorne
Se-Young Yun
45
7
0
04 Jun 2024
Fast yet Safe: Early-Exiting with Risk Control
Fast yet Safe: Early-Exiting with Risk Control
Metod Jazbec
Alexander Timans
Tin Hadvzi Veljković
K. Sakmann
Dan Zhang
C. A. Naesseth
Eric T. Nalisnick
38
5
0
31 May 2024
S3D: A Simple and Cost-Effective Self-Speculative Decoding Scheme for
  Low-Memory GPUs
S3D: A Simple and Cost-Effective Self-Speculative Decoding Scheme for Low-Memory GPUs
Wei Zhong
Manasa Bharadwaj
33
5
0
30 May 2024
SpecDec++: Boosting Speculative Decoding via Adaptive Candidate Lengths
SpecDec++: Boosting Speculative Decoding via Adaptive Candidate Lengths
Kaixuan Huang
Xudong Guo
Mengdi Wang
32
17
0
30 May 2024
A Comprehensive Survey of Accelerated Generation Techniques in Large
  Language Models
A Comprehensive Survey of Accelerated Generation Techniques in Large Language Models
Mahsa Khoshnoodi
Vinija Jain
Mingye Gao
Malavika Srikanth
Aman Chadha
OffRL
28
1
0
15 May 2024
Switchable Decision: Dynamic Neural Generation Networks
Switchable Decision: Dynamic Neural Generation Networks
Shujian Zhang
Korawat Tanwisuth
Chengyue Gong
Pengcheng He
Mi Zhou
BDL
31
0
0
07 May 2024
Kangaroo: Lossless Self-Speculative Decoding via Double Early Exiting
Kangaroo: Lossless Self-Speculative Decoding via Double Early Exiting
Fangcheng Liu
Yehui Tang
Zhenhua Liu
Yunsheng Ni
Kai Han
Yunhe Wang
33
23
0
29 Apr 2024
Parallel Decoding via Hidden Transfer for Lossless Large Language Model
  Acceleration
Parallel Decoding via Hidden Transfer for Lossless Large Language Model Acceleration
Pengfei Wu
Jiahao Liu
Zhuocheng Gong
Qifan Wang
Jinpeng Li
Jingang Wang
Xunliang Cai
Dongyan Zhao
20
1
0
18 Apr 2024
Lossless Acceleration of Large Language Model via Adaptive N-gram
  Parallel Decoding
Lossless Acceleration of Large Language Model via Adaptive N-gram Parallel Decoding
Jie Ou
Yueming Chen
Wenhong Tian
51
12
0
10 Apr 2024
FFN-SkipLLM: A Hidden Gem for Autoregressive Decoding with Adaptive Feed
  Forward Skipping
FFN-SkipLLM: A Hidden Gem for Autoregressive Decoding with Adaptive Feed Forward Skipping
Ajay Jaiswal
Bodun Hu
Lu Yin
Yeonju Ro
Shiwei Liu
Tianlong Chen
Aditya Akella
43
12
0
05 Apr 2024
LLM Inference Unveiled: Survey and Roofline Model Insights
LLM Inference Unveiled: Survey and Roofline Model Insights
Zhihang Yuan
Yuzhang Shang
Yang Zhou
Zhen Dong
Zhe Zhou
...
Yong Jae Lee
Yan Yan
Beidi Chen
Guangyu Sun
Kurt Keutzer
37
79
0
26 Feb 2024
Ouroboros: Generating Longer Drafts Phrase by Phrase for Faster
  Speculative Decoding
Ouroboros: Generating Longer Drafts Phrase by Phrase for Faster Speculative Decoding
Weilin Zhao
Yuxiang Huang
Xu Han
Wang Xu
Chaojun Xiao
Xinrong Zhang
Yewei Fang
Kaihuo Zhang
Zhiyuan Liu
Maosong Sun
35
10
0
21 Feb 2024
ProPD: Dynamic Token Tree Pruning and Generation for LLM Parallel
  Decoding
ProPD: Dynamic Token Tree Pruning and Generation for LLM Parallel Decoding
Shuzhang Zhong
Zebin Yang
Meng Li
Ruihao Gong
Runsheng Wang
Ru Huang
32
6
0
21 Feb 2024
HiRE: High Recall Approximate Top-$k$ Estimation for Efficient LLM
  Inference
HiRE: High Recall Approximate Top-kkk Estimation for Efficient LLM Inference
Yashas Samaga
Varun Yerram
Chong You
Srinadh Bhojanapalli
Sanjiv Kumar
Prateek Jain
Praneeth Netrapalli
49
4
0
14 Feb 2024
A Survey on Transformer Compression
A Survey on Transformer Compression
Yehui Tang
Yunhe Wang
Jianyuan Guo
Zhijun Tu
Kai Han
Hailin Hu
Dacheng Tao
29
27
0
05 Feb 2024
Decoding Speculative Decoding
Decoding Speculative Decoding
Minghao Yan
Saurabh Agarwal
Shivaram Venkataraman
LRM
25
5
0
02 Feb 2024
EE-Tuning: An Economical yet Scalable Solution for Tuning Early-Exit
  Large Language Models
EE-Tuning: An Economical yet Scalable Solution for Tuning Early-Exit Large Language Models
Xuchen Pan
Yanxi Chen
Yaliang Li
Bolin Ding
Jingren Zhou
15
8
0
01 Feb 2024
Towards Efficient Generative Large Language Model Serving: A Survey from
  Algorithms to Systems
Towards Efficient Generative Large Language Model Serving: A Survey from Algorithms to Systems
Xupeng Miao
Gabriele Oliaro
Zhihao Zhang
Xinhao Cheng
Hongyi Jin
Tianqi Chen
Zhihao Jia
61
76
0
23 Dec 2023
Lookahead: An Inference Acceleration Framework for Large Language Model
  with Lossless Generation Accuracy
Lookahead: An Inference Acceleration Framework for Large Language Model with Lossless Generation Accuracy
Yao-Min Zhao
Zhitian Xie
Chen Liang
Chenyi Zhuang
Jinjie Gu
45
11
0
20 Dec 2023
ConsistentEE: A Consistent and Hardness-Guided Early Exiting Method for
  Accelerating Language Models Inference
ConsistentEE: A Consistent and Hardness-Guided Early Exiting Method for Accelerating Language Models Inference
Ziqian Zeng
Yihuai Hong
Hongliang Dai
Huiping Zhuang
Cen Chen
11
10
0
19 Dec 2023
LLM in a flash: Efficient Large Language Model Inference with Limited
  Memory
LLM in a flash: Efficient Large Language Model Inference with Limited Memory
Keivan Alizadeh-Vahid
Iman Mirzadeh
Dmitry Belenko
Karen Khatamifard
Minsik Cho
C. C. D. Mundo
Mohammad Rastegari
Mehrdad Farajtabar
70
110
0
12 Dec 2023
Apparate: Rethinking Early Exits to Tame Latency-Throughput Tensions in
  ML Serving
Apparate: Rethinking Early Exits to Tame Latency-Throughput Tensions in ML Serving
Yinwei Dai
Rui Pan
Anand Iyer
Kai Li
Ravi Netravali
15
7
0
08 Dec 2023
EE-LLM: Large-Scale Training and Inference of Early-Exit Large Language
  Models with 3D Parallelism
EE-LLM: Large-Scale Training and Inference of Early-Exit Large Language Models with 3D Parallelism
Yanxi Chen
Xuchen Pan
Yaliang Li
Bolin Ding
Jingren Zhou
LRM
21
31
0
08 Dec 2023
SPIN: Sparsifying and Integrating Internal Neurons in Large Language
  Models for Text Classification
SPIN: Sparsifying and Integrating Internal Neurons in Large Language Models for Text Classification
Difan Jiao
Yilun Liu
Zhenwei Tang
Daniel Matter
Jürgen Pfeffer
Ashton Anderson
17
1
0
27 Nov 2023
DEED: Dynamic Early Exit on Decoder for Accelerating Encoder-Decoder
  Transformer Models
DEED: Dynamic Early Exit on Decoder for Accelerating Encoder-Decoder Transformer Models
Peng Tang
Pengkai Zhu
Tian Li
Srikar Appalaraju
Vijay Mahadevan
R. Manmatha
32
7
0
15 Nov 2023
Distilling Linguistic Context for Language Model Compression
Distilling Linguistic Context for Language Model Compression
Geondo Park
Gyeongman Kim
Eunho Yang
45
37
0
17 Sep 2021
1