Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2310.05424
Cited By
Fast and Robust Early-Exiting Framework for Autoregressive Language Models with Synchronized Parallel Decoding
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2023
9 October 2023
Sangmin Bae
Jongwoo Ko
Hwanjun Song
SeYoung Yun
Re-assign community
ArXiv (abs)
PDF
HTML
HuggingFace (1 upvotes)
Papers citing
"Fast and Robust Early-Exiting Framework for Autoregressive Language Models with Synchronized Parallel Decoding"
50 / 56 papers shown
Think-at-Hard: Selective Latent Iterations to Improve Reasoning Language Models
Tianyu Fu
Yichen You
Z. Chen
Guohao Dai
Huazhong Yang
Yu Wang
LRM
185
1
0
11 Nov 2025
HatePrototypes: Interpretable and Transferable Representations for Implicit and Explicit Hate Speech Detection
Irina Proskurina
Marc-Antoine Carpentier
Julien Velcin
VLM
120
0
0
09 Nov 2025
Cerberus: Real-Time Video Anomaly Detection via Cascaded Vision-Language Models
Yue Zheng
Xiufang Shi
Jiming Chen
Yuanchao Shu
VLM
104
0
0
18 Oct 2025
Encode, Think, Decode: Scaling test-time reasoning with recursive latent thoughts
Yeskendir Koishekenov
Aldo Lipani
Nicola Cancedda
LRM
150
1
0
08 Oct 2025
Nav-EE: Navigation-Guided Early Exiting for Efficient Vision-Language Models in Autonomous Driving
Haibo Hu
Lianming Huang
X. Wang
Yufei Cui
Shangyu Wu
Nan Guan
Chun Jason Xue
VLM
207
0
0
02 Oct 2025
Choosing to Be Green: Advancing Green AI via Dynamic Model Selection
Emilio Cruciani
Roberto Verdecchia
96
0
0
24 Sep 2025
Mixture-of-Recursions: Learning Dynamic Recursive Depths for Adaptive Token-Level Computation
Sangmin Bae
Yujin Kim
Reza Bayat
S. Kim
Jiyoun Ha
...
Adam Fisch
Hrayr Harutyunyan
Ziwei Ji
Aaron Courville
Se-Young Yun
MoE
283
24
0
14 Jul 2025
OrthoRank: Token Selection via Sink Token Orthogonality for Efficient LLM inference
Seungjun Shin
Jaehoon Oh
Dokwan Oh
160
1
0
05 Jul 2025
AD-EE: Early Exiting for Fast and Reliable Vision-Language Models in Autonomous Driving
Lianming Huang
Haibo Hu
Yufei Cui
Jiacheng Zuo
Shangyu Wu
Nan Guan
Chun Jason Xue
VLM
211
0
0
04 Jun 2025
AdaDecode: Accelerating LLM Decoding with Adaptive Layer Parallelism
Zhepei Wei
Wei-Lin Chen
Xinyu Zhu
Yu Meng
OffRL
301
3
0
04 Jun 2025
Think Before You Accept: Semantic Reflective Verification for Faster Speculative Decoding
Yixuan Wang
Yijun Liu
Shiyu Ji
Yuzhuang Xu
Yang Xu
Qingfu Zhu
Wanxiang Che
OffRL
LRM
293
1
0
24 May 2025
Accelerating Adaptive Retrieval Augmented Generation via Instruction-Driven Representation Reduction of Retrieval Overlaps
Annual Meeting of the Association for Computational Linguistics (ACL), 2025
Jie Ou
Jinyu Guo
Shuaihong Jiang
Zhaokun Wang
Libo Qin
Shunyu Yao
Wenhong Tian
3DV
498
4
0
19 May 2025
DYNAMAX: Dynamic computing for Transformers and Mamba based architectures
Miguel Nogales
Matteo Gambella
Manuel Roveri
258
1
0
29 Apr 2025
HELIOS: Adaptive Model And Early-Exit Selection for Efficient LLM Inference Serving
Avinash Kumar
Shashank Nag
Jason Clemons
L. John
Poulami Das
460
1
0
14 Apr 2025
Speculative Decoding and Beyond: An In-Depth Survey of Techniques
Y. Hu
Zining Liu
Zhenyuan Dong
Tianfan Peng
Bradley McDanel
Shanghang Zhang
717
0
0
27 Feb 2025
AdaSkip: Adaptive Sublayer Skipping for Accelerating Long-Context LLM Inference
AAAI Conference on Artificial Intelligence (AAAI), 2025
Zhuomin He
Yizhen Yao
Pengfei Zuo
Bin Gao
Qinya Li
Zhenzhe Zheng
Fan Wu
276
10
0
04 Jan 2025
The Efficiency vs. Accuracy Trade-off: Optimizing RAG-Enhanced LLM Recommender Systems Using Multi-Head Early Exit
Annual Meeting of the Association for Computational Linguistics (ACL), 2025
Huixue Zhou
Hengrui Gu
Xi Liu
Kaixiong Zhou
Mingfu Liang
...
Wen-Yen Chen
Yiping Han
Bo Long
Rui Zhang
Tianlong Chen
3DV
186
4
0
04 Jan 2025
PrisonBreak: Jailbreaking Large Language Models with at Most Twenty-Five Targeted Bit-flips
Zachary Coalson
Jeonghyun Woo
Shiyang Chen
Yu Sun
Yu Sun
...
Lishan Yang
Gururaj Saileshwar
Prashant J. Nair
Bo Fang
Sanghyun Hong
AAML
501
8
0
10 Dec 2024
CE-CoLLM: Efficient and Adaptive Large Language Models Through Cloud-Edge Collaboration
Hongpeng Jin
Yanzhao Wu
537
19
0
05 Nov 2024
A Theoretical Perspective for Speculative Decoding Algorithm
Neural Information Processing Systems (NeurIPS), 2024
Ming Yin
Minshuo Chen
Kaixuan Huang
Mengdi Wang
209
19
0
30 Oct 2024
Relaxed Recursive Transformers: Effective Parameter Sharing with Layer-wise LoRA
International Conference on Learning Representations (ICLR), 2024
Sangmin Bae
Adam Fisch
Hrayr Harutyunyan
Ziwei Ji
Seungyeon Kim
Tal Schuster
KELM
389
20
0
28 Oct 2024
Dynamic layer selection in decoder-only transformers
Theodore Glavas
Joud Chataoui
Florence Regol
Wassim Jabbour
Antonios Valkanas
Boris N. Oreshkin
Mark Coates
AI4CE
285
2
0
26 Oct 2024
Dynamic Vocabulary Pruning in Early-Exit LLMs
Jort Vincenti
Karim Abdel Sadek
Joan Velja
Matteo Nulli
Metod Jazbec
177
1
0
24 Oct 2024
Router-Tuning: A Simple and Effective Approach for Enabling Dynamic-Depth in Transformers
Shwai He
Tao Ge
Zheyu Shen
Bowei Tian
Xiaoyang Wang
Ang Li
MoE
405
5
0
17 Oct 2024
SWIFT: On-the-Fly Self-Speculative Decoding for LLM Inference Acceleration
International Conference on Learning Representations (ICLR), 2024
Heming Xia
Yongqi Li
Jun Zhang
Cunxiao Du
Wenjie Li
LRM
326
38
0
09 Oct 2024
A-VL: Adaptive Attention for Large Vision-Language Models
AAAI Conference on Artificial Intelligence (AAAI), 2024
Junyang Zhang
Mu Yuan
Ruiguang Zhong
Puhan Luo
Huiyou Zhan
Ningkang Zhang
Chengchen Hu
Xiangyang Li
VLM
409
4
0
23 Sep 2024
PARCO: Parallel AutoRegressive Models for Multi-Agent Combinatorial Optimization
Federico Berto
Chuanbo Hua
Laurin Luttmann
Jiwoo Son
Junyoung Park
Kyuree Ahn
C. Kwon
Lin Xie
Jinkyoo Park
324
0
0
05 Sep 2024
Towards Fast Multilingual LLM Inference: Speculative Decoding and Specialized Drafters
Euiin Yi
Taehyeon Kim
Hongseok Jeung
Du-Seong Chang
Se-Young Yun
176
7
0
24 Jun 2024
When Linear Attention Meets Autoregressive Decoding: Towards More Effective and Efficient Linearized Large Language Models
Haoran You
Yichao Fu
Zheng Wang
Amir Yazdanbakhsh
Yingyan Celine Lin
364
7
0
11 Jun 2024
Speculative Decoding via Early-exiting for Faster LLM Inference with Thompson Sampling Control Mechanism
Jiahao Liu
Qifan Wang
Jingang Wang
Xunliang Cai
179
29
0
06 Jun 2024
Block Transformer: Global-to-Local Language Modeling for Fast Inference
Namgyu Ho
Sangmin Bae
Taehyeon Kim
Hyunjik Jo
Yireun Kim
Tal Schuster
Adam Fisch
James Thorne
Se-Young Yun
306
28
0
04 Jun 2024
Fast yet Safe: Early-Exiting with Risk Control
Metod Jazbec
Alexander Timans
Tin Hadvzi Veljković
K. Sakmann
Dan Zhang
C. A. Naesseth
Eric T. Nalisnick
271
13
0
31 May 2024
S3D: A Simple and Cost-Effective Self-Speculative Decoding Scheme for Low-Memory GPUs
Wei Zhong
Manasa Bharadwaj
342
9
0
30 May 2024
SpecDec++: Boosting Speculative Decoding via Adaptive Candidate Lengths
Kaixuan Huang
Xudong Guo
M. Y. Wang
508
41
0
30 May 2024
A Comprehensive Survey of Accelerated Generation Techniques in Large Language Models
Mahsa Khoshnoodi
Vinija Jain
Mingye Gao
Malavika Srikanth
Vasu Sharma
OffRL
335
9
0
15 May 2024
Switchable Decision: Dynamic Neural Generation Networks
Shujian Zhang
Korawat Tanwisuth
Chengyue Gong
Pengcheng He
Mi Zhou
BDL
207
0
0
07 May 2024
Kangaroo: Lossless Self-Speculative Decoding via Double Early Exiting
Fangcheng Liu
Yehui Tang
Zhenhua Liu
Yunsheng Ni
Kai Han
Yunhe Wang
293
42
0
29 Apr 2024
Parallel Decoding via Hidden Transfer for Lossless Large Language Model Acceleration
Pengfei Wu
Jiahao Liu
Zhuocheng Gong
Qifan Wang
Jinpeng Li
Jingang Wang
Xunliang Cai
Dongyan Zhao
187
3
0
18 Apr 2024
Lossless Acceleration of Large Language Model via Adaptive N-gram Parallel Decoding
North American Chapter of the Association for Computational Linguistics (NAACL), 2024
Jie Ou
Yueming Chen
Wenhong Tian
298
23
0
10 Apr 2024
FFN-SkipLLM: A Hidden Gem for Autoregressive Decoding with Adaptive Feed Forward Skipping
Ajay Jaiswal
Bodun Hu
Lu Yin
Yeonju Ro
Shiwei Liu
Tianlong Chen
Aditya Akella
286
21
0
05 Apr 2024
LLM Inference Unveiled: Survey and Roofline Model Insights
Zhihang Yuan
Yuzhang Shang
Yang Zhou
Zhen Dong
Zhe Zhou
...
Yong Jae Lee
Yan Yan
Beidi Chen
Guangyu Sun
Kurt Keutzer
619
148
0
26 Feb 2024
Ouroboros: Generating Longer Drafts Phrase by Phrase for Faster Speculative Decoding
Weilin Zhao
Yuxiang Huang
Xu Han
Wang Xu
Chaojun Xiao
Xinrong Zhang
Yewei Fang
Kaihuo Zhang
Zhiyuan Liu
Maosong Sun
270
23
0
21 Feb 2024
ProPD: Dynamic Token Tree Pruning and Generation for LLM Parallel Decoding
Shuzhang Zhong
Zebin Yang
Meng Li
Ruihao Gong
Runsheng Wang
Ru Huang
216
12
0
21 Feb 2024
HiRE: High Recall Approximate Top-
k
k
k
Estimation for Efficient LLM Inference
Yashas Samaga
Varun Yerram
Chong You
Srinadh Bhojanapalli
Sanjiv Kumar
Prateek Jain
Praneeth Netrapalli
181
7
0
14 Feb 2024
A Survey on Transformer Compression
Yehui Tang
Yunhe Wang
Jianyuan Guo
Zhijun Tu
Kai Han
Hailin Hu
Dacheng Tao
460
66
0
05 Feb 2024
Decoding Speculative Decoding
Minghao Yan
Saurabh Agarwal
Shivaram Venkataraman
LRM
326
24
0
02 Feb 2024
EE-Tuning: An Economical yet Scalable Solution for Tuning Early-Exit Large Language Models
Xuchen Pan
Yanxi Chen
Yaliang Li
Bolin Ding
Jingren Zhou
245
10
0
01 Feb 2024
Towards Efficient Generative Large Language Model Serving: A Survey from Algorithms to Systems
Xupeng Miao
Xupeng Miao
Zhihao Zhang
Xinhao Cheng
Hongyi Jin
Tianqi Chen
Zhihao Jia
394
119
0
23 Dec 2023
Lookahead: An Inference Acceleration Framework for Large Language Model with Lossless Generation Accuracy
Yao-Min Zhao
Zhitian Xie
Chen Liang
Chenyi Zhuang
Jinjie Gu
337
34
0
20 Dec 2023
ConsistentEE: A Consistent and Hardness-Guided Early Exiting Method for Accelerating Language Models Inference
Huiping Zhuang
Yihuai Hong
Hongliang Dai
Huiping Zhuang
Cen Chen
277
17
0
19 Dec 2023
1
2
Next