Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2204.09656
Cited By
v1
v2 (latest)
A Fast Post-Training Pruning Framework for Transformers
Neural Information Processing Systems (NeurIPS), 2022
29 March 2022
Woosuk Kwon
Sehoon Kim
Michael W. Mahoney
Joseph Hassoun
Kurt Keutzer
A. Gholami
Re-assign community
ArXiv (abs)
PDF
HTML
Github (193★)
Papers citing
"A Fast Post-Training Pruning Framework for Transformers"
50 / 69 papers shown
Rethinking Vision Transformer Depth via Structural Reparameterization
Chengwei Zhou
Vipin Chaudhary
Gourav Datta
ViT
157
0
0
24 Nov 2025
StableMorph: High-Quality Face Morph Generation with Stable Diffusion
Wassim Kabbani
Kiran Raja
Raghavendra Ramachandra
C. Busch
147
1
0
11 Nov 2025
Ensembling Pruned Attention Heads For Uncertainty-Aware Efficient Transformers
Firas Gabetni
Giuseppe Curci
Andrea Pilzer
Subhankar Roy
Elisa Ricci
Gianni Franchi
AI4CE
212
3
0
21 Oct 2025
Elastic ViTs from Pretrained Models without Retraining
Walter Simoncini
Michael Dorkenwald
Tijmen Blankevoort
Cees G. M. Snoek
Yuki Markus Asano
VLM
196
0
0
20 Oct 2025
Don't Be Greedy, Just Relax! Pruning LLMs via Frank-Wolfe
Christophe Roux
Max Zimmer
Alexandre d’Aspremont
Sebastian Pokutta
198
2
0
15 Oct 2025
Efficient Large Language Models with Zero-Shot Adjustable Acceleration
Sajjad Kachuee
M. Sharifkhani
237
0
0
01 Sep 2025
Spatio-Temporal Pruning for Compressed Spiking Large Language Models
Yi Jiang
Malyaban Bal
Brian Matejek
Susmit Jha
Adam D. Cobb
Abhronil Sengupta
163
1
0
23 Aug 2025
GLASS: Test-Time Acceleration for LLMs via Global-Local Neural Importance Aggregation
Amirmohsen Sattarifard
Sepehr Lavasani
Ehsan Imani
Kunlin Zhang
Hanlin Xu
Fengyu Sun
Negar Hassanpour
Chao Gao
VLM
140
1
0
19 Aug 2025
CompressKV: Semantic Retrieval Heads Know What Tokens are Not Important Before Generation
Xiaolin Lin
Jingcun Wang
Olga Kondrateva
Yiyu Shi
Bing Li
Grace Li Zhang
MQ
VLM
218
2
0
04 Aug 2025
FAIR-Pruner: Leveraging Tolerance of Difference for Flexible Automatic Layer-Wise Neural Network Pruning
Chenqing Lin
Mostafa Hussien
Chengyao Yu
Bingyi Jing
M. Cheriet
Osama Abdelrahman
Ruixing Ming
257
0
0
04 Aug 2025
Accelerate 3D Object Detection Models via Zero-Shot Attention Key Pruning
Lizhen Xu
Xiuxiu Bai
Yang Liu
Jianwu Fang
Zehao Wu
434
2
0
01 Jul 2025
Olica: Efficient Structured Pruning of Large Language Models without Retraining
Jiujun He
Huazhen Lin
264
3
0
10 Jun 2025
SAFE: Finding Sparse and Flat Minima to Improve Pruning
Dongyeop Lee
Kwanhee Lee
Jinseok Chung
Namhoon Lee
394
5
0
07 Jun 2025
Smooth Model Compression without Fine-Tuning
Christina Runkel
Natacha Kuete Meli
Jovita Lukasik
A. Biguri
Carola-Bibiane Schönlieb
Michael Moeller
307
0
0
30 May 2025
TuneComp: Joint Fine-tuning and Compression for Large Foundation Models
Xiangyu Chen
Jing Liu
Ye Wang
Matthew Brand
Wang
T. Koike-Akino
292
0
0
27 May 2025
TeleSparse: Practical Privacy-Preserving Verification of Deep Neural Networks
Proceedings on Privacy Enhancing Technologies (PoPETs), 2025
Mohammad Maheri
Hamed Haddadi
Alex Davidson
419
7
0
27 Apr 2025
COBRA: Algorithm-Architecture Co-optimized Binary Transformer Accelerator for Edge Inference
Ye Qiao
Zhiheng Cheng
Yian Wang
Yifan Zhang
Yunzhe Deng
Sitao Huang
563
3
0
22 Apr 2025
Payload-Aware Intrusion Detection with CMAE and Large Language Models
ACM Transactions on Privacy and Security (TOPS), 2025
Yongcheol Kim
Chanjae Lee
Young Yoon
377
5
0
23 Mar 2025
EfficientLLaVA:Generalizable Auto-Pruning for Large Vision-language Models
Computer Vision and Pattern Recognition (CVPR), 2025
Yinan Liang
Xiping Hu
Xiuwei Xu
Jie Zhou
Jiwen Lu
VLM
LRM
328
10
0
19 Mar 2025
Triad: Empowering LMM-based Anomaly Detection with Vision Expert-guided Visual Tokenizer and Manufacturing Process
Yuanze Li
Shihao Yuan
Haolin Wang
Qizhang Li
Ming-Yu Liu
Chen Xu
Guangming Shi
Wangmeng Zuo
393
4
0
17 Mar 2025
Towards Extreme Pruning of LLMs with Plug-and-Play Mixed Sparsity
Chi Xu
Gefei Zhang
Yantong Zhu
Luca Benini
Guosheng Hu
Yawei Li
Zhihong Zhang
235
1
0
14 Mar 2025
Empowering Edge Intelligence: A Comprehensive Survey on On-Device AI Models
ACM Computing Surveys (ACM Comput. Surv.), 2025
Xubin Wang
Zhiqing Tang
Jianxiong Guo
Tianhui Meng
Chenhao Wang
Tian-sheng Wang
Weijia Jia
476
126
0
08 Mar 2025
IDEA Prune: An Integrated Enlarge-and-Prune Pipeline in Generative Language Model Pretraining
Yixiao Li
Xianzhi Du
Ajay Jaiswal
Tao Lei
T. Zhao
Chong-Jun Wang
Jianyu Wang
215
1
0
07 Mar 2025
Sliding-Window Merging for Compacting Patch-Redundant Layers in LLMs
Xuan Ding
Rui Sun
Yunjian Zhang
Xiu Yan
Yueqi Zhou
Kaihao Huang
Suzhong Fu
Angelica I Aviles-Rivero
Chuanlong Xie
Yao Zhu
664
4
0
26 Feb 2025
Compressing Language Models for Specialized Domains
Miles Williams
G. Chrysostomou
Vitor Jeronymo
Nikolaos Aletras
MQ
408
1
0
25 Feb 2025
FedSpaLLM: Federated Pruning of Large Language Models
North American Chapter of the Association for Computational Linguistics (NAACL), 2024
Guangji Bai
Yijiang Li
Zilinghan Li
Bo Pan
Kibaek Kim
FedML
446
11
0
20 Feb 2025
Cognitive Edge Computing: A Comprehensive Survey on Optimizing Large Models and AI Agents for Pervasive Deployment
International Conference on Artificial Neural Networks (ICANN), 2025
Xubin Wang
Weijia Jia
Weijia Jia
617
21
0
04 Jan 2025
TrimLLM: Progressive Layer Dropping for Domain-Specific LLMs
Lanxiang Hu
Tajana Rosing
Hao Zhang
358
2
0
15 Dec 2024
Redundant Queries in DETR-Based 3D Detection Methods: Unnecessary and Prunable
Lizhen Xu
Zehao Wu
Wenzhao Qiu
Zehao Wu
Xiuxiu Bai
K. Mei
Jianru Xue
529
5
0
03 Dec 2024
Layer Pruning with Consensus: A Triple-Win Solution
IEEE Access (IEEE Access), 2024
Leandro Giusti Mugnaini
Carolina Tavares Duarte
Anna Helena Reali Costa
Artur Jordao
322
1
0
21 Nov 2024
Not All Heads Matter: A Head-Level KV Cache Compression Method with Integrated Retrieval and Reasoning
International Conference on Learning Representations (ICLR), 2024
Yu Fu
Zefan Cai
Abedelkadir Asi
Wayne Xiong
Yue Dong
Wen Xiao
506
70
0
25 Oct 2024
Mixture Compressor for Mixture-of-Experts LLMs Gains More
International Conference on Learning Representations (ICLR), 2024
Wei Huang
Yue Liao
Jianhui Liu
Ruifei He
Haoru Tan
Shiming Zhang
Hongsheng Li
Si Liu
Xiaojuan Qi
MoE
346
31
0
08 Oct 2024
Aggressive Post-Training Compression on Extremely Large Language Models
Zining Zhang
Yao Chen
Bingsheng He
Zhenjie Zhang
105
0
0
30 Sep 2024
A Tighter Complexity Analysis of SparseGPT
Xiaoyu Li
Yingyu Liang
Zhenmei Shi
Zhao Song
405
23
0
22 Aug 2024
InfiniGen: Efficient Generative Inference of Large Language Models with Dynamic KV Cache Management
Wonbeom Lee
Jungi Lee
Junghwan Seo
Jaewoong Sim
RALM
298
222
0
28 Jun 2024
DIET: Customized Slimming for Incompatible Networks in Sequential Recommendation
Kairui Fu
Shengyu Zhang
Zheqi Lv
Jingyuan Chen
Jiwei Li
279
9
0
13 Jun 2024
Large Language Model Pruning
Hanjuan Huang
Hao-Jia Song
H. Pao
522
1
0
24 May 2024
Combining Relevance and Magnitude for Resource-Aware DNN Pruning
C. Chiasserini
F. Malandrino
Nuria Molner
Zhiqiang Zhao
318
1
0
21 May 2024
Decoupled Weight Decay for Any
p
p
p
Norm
N. Outmezguine
Noam Levi
300
5
0
16 Apr 2024
The Need for Speed: Pruning Transformers with One Recipe
Samir Khaki
Konstantinos N. Plataniotis
401
17
0
26 Mar 2024
AI and Memory Wall
A. Gholami
Z. Yao
Sehoon Kim
Coleman Hooper
Michael W. Mahoney
Kurt Keutzer
342
307
0
21 Mar 2024
FastDecode: High-Throughput GPU-Efficient LLM Serving using Heterogeneous Pipelines
Jiaao He
Jidong Zhai
290
57
0
18 Mar 2024
Training Machine Learning models at the Edge: A Survey
Aymen Rayane Khouas
Mohamed Reda Bouadjenek
Hakim Hacid
Sunil Aryal
518
31
0
05 Mar 2024
Not All Experts are Equal: Efficient Expert Pruning and Skipping for Mixture-of-Experts Large Language Models
Xudong Lu
Zijun Chen
Yuhui Xu
Aojun Zhou
Siyuan Huang
Bo Zhang
Junchi Yan
Jiaming Song
MoE
475
92
0
22 Feb 2024
TQCompressor: improving tensor decomposition methods in neural networks via permutations
V. Abronin
A. Naumov
D. Mazur
D. Bystrov
K. Tsarova
Ar. Melnikov
Ivan Oseledets
S. Dolgov
R. Brasher
M. Perelshtein
253
10
0
29 Jan 2024
APT: Adaptive Pruning and Tuning Pretrained Language Models for Efficient Training and Inference
International Conference on Machine Learning (ICML), 2024
Bowen Zhao
Hannaneh Hajishirzi
Qingqing Cao
453
29
0
22 Jan 2024
PERP: Rethinking the Prune-Retrain Paradigm in the Era of LLMs
Max Zimmer
Megi Andoni
Christoph Spiegel
Sebastian Pokutta
VLM
593
17
0
23 Dec 2023
TULIP: Transformer for Upsampling of LiDAR Point Clouds
Bin Yang
Patrick Pfreundschuh
Roland Siegwart
Marco Hutter
Peyman Moghadam
Vaishakh Patil
3DPC
428
21
0
11 Dec 2023
An LLM Compiler for Parallel Function Calling
Sehoon Kim
Suhong Moon
Ryan Tabrizi
Nicholas Lee
Michael W. Mahoney
Kurt Keutzer
A. Gholami
LRM
507
140
0
07 Dec 2023
ComPEFT: Compression for Communicating Parameter Efficient Updates via Sparsification and Quantization
Prateek Yadav
Leshem Choshen
Colin Raffel
Mohit Bansal
297
20
0
22 Nov 2023
1
2
Next
Page 1 of 2