Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2403.02181
Cited By
Not All Layers of LLMs Are Necessary During Inference
4 March 2024
Siqi Fan
Xin Jiang
Xiang Li
Xuying Meng
Peng Han
Shuo Shang
Aixin Sun
Yequan Wang
Zhongyuan Wang
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Not All Layers of LLMs Are Necessary During Inference"
6 / 6 papers shown
Title
Representation-based Reward Modeling for Efficient Safety Alignment of Large Language Model
Qiyuan Deng
X. Bai
Kehai Chen
Yaowei Wang
Liqiang Nie
Min Zhang
OffRL
50
0
0
13 Mar 2025
Large Language Model Inference Acceleration: A Comprehensive Hardware Perspective
Jinhao Li
Jiaming Xu
Shan Huang
Yonghua Chen
Wen Li
...
Jiayi Pan
Li Ding
Hao Zhou
Yu Wang
Guohao Dai
25
13
0
06 Oct 2024
SLEB: Streamlining LLMs through Redundancy Verification and Elimination of Transformer Blocks
Jiwon Song
Kyungseok Oh
Taesu Kim
Hyungjun Kim
Yulhwa Kim
Jae-Joon Kim
47
5
0
14 Feb 2024
One Pass Streaming Algorithm for Super Long Token Attention Approximation in Sublinear Space
Raghav Addanki
Chenyang Li
Zhao-quan Song
Chiwun Yang
34
2
0
24 Nov 2023
Mixture-of-Experts with Expert Choice Routing
Yan-Quan Zhou
Tao Lei
Han-Chu Liu
Nan Du
Yanping Huang
Vincent Zhao
Andrew M. Dai
Zhifeng Chen
Quoc V. Le
James Laudon
MoE
137
203
0
18 Feb 2022
Sparsity in Deep Learning: Pruning and growth for efficient inference and training in neural networks
Torsten Hoefler
Dan Alistarh
Tal Ben-Nun
Nikoli Dryden
Alexandra Peste
MQ
128
526
0
31 Jan 2021
1