Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2404.08763
Cited By
CATS: Contextually-Aware Thresholding for Sparsity in Large Language Models
12 April 2024
Je-Yong Lee
Donghyun Lee
Genghan Zhang
Mo Tiwari
Azalia Mirhoseini
Re-assign community
ArXiv
PDF
HTML
Papers citing
"CATS: Contextually-Aware Thresholding for Sparsity in Large Language Models"
12 / 12 papers shown
Title
Efficient Shapley Value-based Non-Uniform Pruning of Large Language Models
Chuan Sun
Han Yu
Lizhen Cui
Xiaoxiao Li
28
0
0
03 May 2025
R-Sparse: Rank-Aware Activation Sparsity for Efficient LLM Inference
Zhenyu (Allen) Zhang
Zechun Liu
Yuandong Tian
Harshit Khaitan
Z. Wang
Steven Li
54
0
0
28 Apr 2025
Adaptive Rank Allocation: Speeding Up Modern Transformers with RaNA Adapters
Roberto Garcia
Jerry Liu
Daniel Sorvisto
Sabri Eyuboglu
83
0
0
23 Mar 2025
Efficient LLM Inference using Dynamic Input Pruning and Cache-Aware Masking
Marco Federici
Davide Belli
M. V. Baalen
Amir Jalalirad
Andrii Skliar
Bence Major
Markus Nagel
Paul N. Whatmough
76
0
0
02 Dec 2024
ReLU Strikes Back: Exploiting Activation Sparsity in Large Language Models
Iman Mirzadeh
Keivan Alizadeh-Vahid
Sachin Mehta
C. C. D. Mundo
Oncel Tuzel
Golnoosh Samei
Mohammad Rastegari
Mehrdad Farajtabar
118
58
0
06 Oct 2023
Mixture of Attention Heads: Selecting Attention Heads Per Token
Xiaofeng Zhang
Yikang Shen
Zeyu Huang
Jie Zhou
Wenge Rong
Zhang Xiong
MoE
90
42
0
11 Oct 2022
Sgap: Towards Efficient Sparse Tensor Algebra Compilation for GPU
Genghan Zhang
Yuetong Zhao
Yanting Tao
Zhongming Yu
Guohao Dai
Sitao Huang
Yuanyuan Wen
Pavlos Petoumenos
Yu Wang
38
4
0
07 Sep 2022
Mixture-of-Experts with Expert Choice Routing
Yan-Quan Zhou
Tao Lei
Han-Chu Liu
Nan Du
Yanping Huang
Vincent Zhao
Andrew M. Dai
Zhifeng Chen
Quoc V. Le
James Laudon
MoE
143
323
0
18 Feb 2022
Beyond Distillation: Task-level Mixture-of-Experts for Efficient Inference
Sneha Kudugunta
Yanping Huang
Ankur Bapna
M. Krikun
Dmitry Lepikhin
Minh-Thang Luong
Orhan Firat
MoE
119
104
0
24 Sep 2021
Carbon Emissions and Large Neural Network Training
David A. Patterson
Joseph E. Gonzalez
Quoc V. Le
Chen Liang
Lluís-Miquel Munguía
D. Rothchild
David R. So
Maud Texier
J. Dean
AI4CE
233
626
0
21 Apr 2021
Sparsity in Deep Learning: Pruning and growth for efficient inference and training in neural networks
Torsten Hoefler
Dan Alistarh
Tal Ben-Nun
Nikoli Dryden
Alexandra Peste
MQ
128
679
0
31 Jan 2021
BinaryBERT: Pushing the Limit of BERT Quantization
Haoli Bai
Wei Zhang
Lu Hou
Lifeng Shang
Jing Jin
Xin Jiang
Qun Liu
Michael Lyu
Irwin King
MQ
138
183
0
31 Dec 2020
1