Fast and Robust Early-Exiting Framework for Autoregressive Language Models with Synchronized Parallel Decoding

Conference on Empirical Methods in Natural Language Processing (EMNLP), 2023

9 October 2023

ArXiv (abs)PDF HTML HuggingFace (1 upvotes)

Papers citing "Fast and Robust Early-Exiting Framework for Autoregressive Language Models with Synchronized Parallel Decoding"

50 / 56 papers shown

Think-at-Hard: Selective Latent Iterations to Improve Reasoning Language Models

185

11 Nov 2025

HatePrototypes: Interpretable and Transferable Representations for Implicit and Explicit Hate Speech Detection

Irina Proskurina

Marc-Antoine Carpentier

Julien Velcin

VLM

120

09 Nov 2025

Cerberus: Real-Time Video Anomaly Detection via Cascaded Vision-Language Models

104

18 Oct 2025

Encode, Think, Decode: Scaling test-time reasoning with recursive latent thoughts

Yeskendir Koishekenov

Aldo Lipani

Nicola Cancedda

LRM

150

08 Oct 2025

Nav-EE: Navigation-Guided Early Exiting for Efficient Vision-Language Models in Autonomous Driving

207

02 Oct 2025

Choosing to Be Green: Advancing Green AI via Dynamic Model Selection

Emilio Cruciani

Roberto Verdecchia

24 Sep 2025

Mixture-of-Recursions: Learning Dynamic Recursive Depths for Adaptive Token-Level Computation

...

283

14 Jul 2025

OrthoRank: Token Selection via Sink Token Orthogonality for Efficient LLM inference

Seungjun Shin

Jaehoon Oh

Dokwan Oh

160

05 Jul 2025

AD-EE: Early Exiting for Fast and Reliable Vision-Language Models in Autonomous Driving

211

04 Jun 2025

AdaDecode: Accelerating LLM Decoding with Adaptive Layer Parallelism

301

04 Jun 2025

Think Before You Accept: Semantic Reflective Verification for Faster Speculative Decoding

293

24 May 2025

Accelerating Adaptive Retrieval Augmented Generation via Instruction-Driven Representation Reduction of Retrieval OverlapsAnnual Meeting of the Association for Computational Linguistics (ACL), 2025

498

19 May 2025

DYNAMAX: Dynamic computing for Transformers and Mamba based architectures

Miguel Nogales

Matteo Gambella

Manuel Roveri

258

29 Apr 2025

HELIOS: Adaptive Model And Early-Exit Selection for Efficient LLM Inference Serving

460

14 Apr 2025

Speculative Decoding and Beyond: An In-Depth Survey of Techniques

717

27 Feb 2025

AdaSkip: Adaptive Sublayer Skipping for Accelerating Long-Context LLM InferenceAAAI Conference on Artificial Intelligence (AAAI), 2025

276

04 Jan 2025

The Efficiency vs. Accuracy Trade-off: Optimizing RAG-Enhanced LLM Recommender Systems Using Multi-Head Early ExitAnnual Meeting of the Association for Computational Linguistics (ACL), 2025

...

186

04 Jan 2025

PrisonBreak: Jailbreaking Large Language Models with at Most Twenty-Five Targeted Bit-flips

...

501

10 Dec 2024

CE-CoLLM: Efficient and Adaptive Large Language Models Through Cloud-Edge Collaboration

Hongpeng Jin

Yanzhao Wu

537

05 Nov 2024

A Theoretical Perspective for Speculative Decoding AlgorithmNeural Information Processing Systems (NeurIPS), 2024

209

30 Oct 2024

Relaxed Recursive Transformers: Effective Parameter Sharing with Layer-wise LoRAInternational Conference on Learning Representations (ICLR), 2024

389

28 Oct 2024

Dynamic layer selection in decoder-only transformers

285

26 Oct 2024

Dynamic Vocabulary Pruning in Early-Exit LLMs

177

24 Oct 2024

Router-Tuning: A Simple and Effective Approach for Enabling Dynamic-Depth in Transformers

405

17 Oct 2024

SWIFT: On-the-Fly Self-Speculative Decoding for LLM Inference AccelerationInternational Conference on Learning Representations (ICLR), 2024

Yongqi Li

Wenjie Li

326

09 Oct 2024

A-VL: Adaptive Attention for Large Vision-Language ModelsAAAI Conference on Artificial Intelligence (AAAI), 2024

409

23 Sep 2024

PARCO: Parallel AutoRegressive Models for Multi-Agent Combinatorial Optimization

324

05 Sep 2024

Towards Fast Multilingual LLM Inference: Speculative Decoding and Specialized Drafters

Se-Young Yun

176

24 Jun 2024

When Linear Attention Meets Autoregressive Decoding: Towards More Effective and Efficient Linearized Large Language Models

Haoran You

Yichao Fu

Zheng Wang

Amir Yazdanbakhsh

Yingyan Celine Lin

364

11 Jun 2024

Speculative Decoding via Early-exiting for Faster LLM Inference with Thompson Sampling Control Mechanism

179

06 Jun 2024

Block Transformer: Global-to-Local Language Modeling for Fast Inference

Se-Young Yun

306

04 Jun 2024

Fast yet Safe: Early-Exiting with Risk Control

271

31 May 2024

S3D: A Simple and Cost-Effective Self-Speculative Decoding Scheme for Low-Memory GPUs

Wei Zhong

Manasa Bharadwaj

342

30 May 2024

SpecDec++: Boosting Speculative Decoding via Adaptive Candidate Lengths

Kaixuan Huang

Xudong Guo

M. Y. Wang

508

30 May 2024

A Comprehensive Survey of Accelerated Generation Techniques in Large Language Models

Vinija Jain

335

15 May 2024

Switchable Decision: Dynamic Neural Generation Networks

207

07 May 2024

Kangaroo: Lossless Self-Speculative Decoding via Double Early Exiting

293

29 Apr 2024

Parallel Decoding via Hidden Transfer for Lossless Large Language Model Acceleration

Dongyan Zhao

187

18 Apr 2024

Lossless Acceleration of Large Language Model via Adaptive N-gram Parallel DecodingNorth American Chapter of the Association for Computational Linguistics (NAACL), 2024

Jie Ou

Yueming Chen

Wenhong Tian

298

10 Apr 2024

FFN-SkipLLM: A Hidden Gem for Autoregressive Decoding with Adaptive Feed Forward Skipping

Lu Yin

286

05 Apr 2024

LLM Inference Unveiled: Survey and Roofline Model Insights

Zhihang Yuan

Yuzhang Shang

Yang Zhou

Zhen Dong

Zhe Zhou

...

Yong Jae Lee

Yan Yan

Beidi Chen

Guangyu Sun

Kurt Keutzer

619

148

26 Feb 2024

Ouroboros: Generating Longer Drafts Phrase by Phrase for Faster Speculative Decoding

Weilin Zhao

Xu Han

Chaojun Xiao

Maosong Sun

270

21 Feb 2024

ProPD: Dynamic Token Tree Pruning and Generation for LLM Parallel Decoding

216

21 Feb 2024

HiRE: High Recall Approximate Top-

k

Estimation for Efficient LLM Inference

Yashas Samaga

Varun Yerram

Chong You

Srinadh Bhojanapalli

Sanjiv Kumar

Prateek Jain

Praneeth Netrapalli

181

14 Feb 2024

A Survey on Transformer Compression

460

05 Feb 2024

Decoding Speculative Decoding

Minghao Yan

Saurabh Agarwal

Shivaram Venkataraman

LRM

326

02 Feb 2024

EE-Tuning: An Economical yet Scalable Solution for Tuning Early-Exit Large Language Models

Jingren Zhou

245

01 Feb 2024

Towards Efficient Generative Large Language Model Serving: A Survey from Algorithms to Systems

394

119

23 Dec 2023

Lookahead: An Inference Acceleration Framework for Large Language Model with Lossless Generation Accuracy

337

20 Dec 2023

ConsistentEE: A Consistent and Hardness-Guided Early Exiting Method for Accelerating Language Models Inference

277

19 Dec 2023