Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2404.06954
Cited By
Accelerating Inference in Large Language Models with a Unified Layer Skipping Strategy
10 April 2024
Yijin Liu
Fandong Meng
Jie Zhou
AI4CE
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Accelerating Inference in Large Language Models with a Unified Layer Skipping Strategy"
8 / 8 papers shown
Title
DASH: Input-Aware Dynamic Layer Skipping for Efficient LLM Inference with Markov Decision Policies
Ning Yang
Fangxin Liu
Junjie Wang
Tao Yang
Kan Liu
Haibing Guan
Li Jiang
AI4CE
85
0
0
23 May 2025
KNN-SSD: Enabling Dynamic Self-Speculative Decoding via Nearest Neighbor Layer Set Optimization
Mingbo Song
Heming Xia
Jun Zhang
Chak Tou Leong
Qiancheng Xu
Wenjie Li
Sujian Li
42
0
0
22 May 2025
AdaSkip: Adaptive Sublayer Skipping for Accelerating Long-Context LLM Inference
Zhuomin He
Yizhen Yao
Pengfei Zuo
Bin Gao
Qinya Li
Zhenzhe Zheng
Fan Wu
100
1
0
04 Jan 2025
Dynamic layer selection in decoder-only transformers
Theodore Glavas
Joud Chataoui
Florence Regol
Wassim Jabbour
Antonios Valkanas
Boris N. Oreshkin
Mark Coates
AI4CE
83
1
0
26 Oct 2024
Understanding Layer Significance in LLM Alignment
Guangyuan Shi
Zexin Lu
Xiaoyu Dong
Wenlong Zhang
Xuanyu Zhang
Yujie Feng
Xiao-Ming Wu
139
3
0
23 Oct 2024
FiRST: Finetuning Router-Selective Transformers for Input-Adaptive Latency Reduction
Akriti Jain
Saransh Sharma
Koyel Mukherjee
Soumyabrata Pal
51
1
0
16 Oct 2024
SWIFT: On-the-Fly Self-Speculative Decoding for LLM Inference Acceleration
Heming Xia
Yongqi Li
Jun Zhang
Cunxiao Du
Wenjie Li
LRM
138
13
0
09 Oct 2024
Mixture-of-Modules: Reinventing Transformers as Dynamic Assemblies of Modules
Zhuocheng Gong
Ang Lv
Jian Guan
Junxi Yan
Wei Wu
Huishuai Zhang
Minlie Huang
Dongyan Zhao
Rui Yan
MoE
86
7
0
09 Jul 2024
1