Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2301.00774
Cited By
v1
v2
v3 (latest)
SparseGPT: Massive Language Models Can Be Accurately Pruned in One-Shot
International Conference on Machine Learning (ICML), 2023
2 January 2023
Elias Frantar
Dan Alistarh
VLM
Re-assign community
ArXiv (abs)
PDF
HTML
HuggingFace (3 upvotes)
Github (799★)
Papers citing
"SparseGPT: Massive Language Models Can Be Accurately Pruned in One-Shot"
50 / 665 papers shown
Align-KD: Distilling Cross-Modal Alignment Knowledge for Mobile Vision-Language Model
Qianhan Feng
Wenshuo Li
Tong Lin
Xinghao Chen
VLM
310
7
0
02 Dec 2024
Efficient LLM Inference using Dynamic Input Pruning and Cache-Aware Masking
Marco Federici
Davide Belli
M. V. Baalen
Amir Jalalirad
Andrii Skliar
Bence Major
Markus Nagel
Paul N. Whatmough
579
9
0
02 Dec 2024
Is Oracle Pruning the True Oracle?
Sicheng Feng
Keda Tao
Haoyu Wang
VLM
351
2
0
28 Nov 2024
Preserving Deep Representations In One-Shot Pruning: A Hessian-Free Second-Order Optimization Framework
International Conference on Learning Representations (ICLR), 2024
Ryan Lucas
Rahul Mazumder
313
6
0
27 Nov 2024
Mixture of Cache-Conditional Experts for Efficient Mobile Device Inference
Andrii Skliar
T. V. Rozendaal
Romain Lepert
Todor Boinovski
M. V. Baalen
Markus Nagel
Paul N. Whatmough
B. Bejnordi
MoE
408
7
0
27 Nov 2024
Reassessing Layer Pruning in LLMs: New Insights and Methods
Yao Lu
Hao Cheng
Yujie Fang
Zeyu Wang
Jiaheng Wei
Dongwei Xu
Qi Xuan
Xiaoniu Yang
Zhaowei Zhu
340
16
0
23 Nov 2024
Layer Pruning with Consensus: A Triple-Win Solution
IEEE Access (IEEE Access), 2024
Leandro Giusti Mugnaini
Carolina Tavares Duarte
Anna Helena Reali Costa
Artur Jordao
297
1
0
21 Nov 2024
AutoMixQ: Self-Adjusting Quantization for High Performance Memory-Efficient Fine-Tuning
Changhai Zhou
Shiyang Zhang
Yuhua Zhou
Zekai Liu
Shichao Weng
MQ
214
0
0
21 Nov 2024
DRPruning: Efficient Large Language Model Pruning through Distributionally Robust Optimization
Annual Meeting of the Association for Computational Linguistics (ACL), 2024
Hexuan Deng
Wenxiang Jiao
Xuebo Liu
Min Zhang
Zhaopeng Tu
Zhaopeng Tu
VLM
573
2
0
21 Nov 2024
From Pruning to Grafting: Dynamic Knowledge Redistribution via Learnable Layer Fusion
Zehua Pei
Hui-Ling Zhen
Xianzhi Yu
Sinno Jialin Pan
Mingxuan Yuan
Bei Yu
AI4CE
530
5
0
21 Nov 2024
SAM Decoding: Speculative Decoding via Suffix Automaton
Annual Meeting of the Association for Computational Linguistics (ACL), 2024
Yuxuan Hu
Ke Wang
Jing Zhang
Fanjin Zhang
Xuefei Liu
Zeyang Zhang
Jing Zhang
475
18
0
16 Nov 2024
AmoebaLLM: Constructing Any-Shape Large Language Models for Efficient and Instant Deployment
Neural Information Processing Systems (NeurIPS), 2024
Y. Fu
Zhongzhi Yu
Junwei Li
Jiayi Qian
Yongan Zhang
Xiangchi Yuan
Dachuan Shi
Roman Yakunin
Y. Lin
285
7
0
15 Nov 2024
P
2
^2
2
Law: Scaling Law for Post-Training After Model Pruning
Xiaodong Chen
Yuxuan Hu
Jing Zhang
Yanling Wang
Xuefei Liu
Zeyang Zhang
Jing Zhang
232
0
0
15 Nov 2024
Reducing Reasoning Costs: The Path of Optimization for Chain of Thought via Sparse Attention Mechanism
Libo Wang
LRM
AI4CE
548
0
0
14 Nov 2024
Zeroth-Order Adaptive Neuron Alignment Based Pruning without Re-Training
Elia Cunegatti
Leonardo Lucio Custode
Giovanni Iacca
641
2
0
11 Nov 2024
CE-CoLLM: Efficient and Adaptive Large Language Models Through Cloud-Edge Collaboration
Hongpeng Jin
Yanzhao Wu
550
20
0
05 Nov 2024
DeeR-VLA: Dynamic Inference of Multimodal Large Language Models for Efficient Robot Execution
Neural Information Processing Systems (NeurIPS), 2024
Yang Yue
Yulin Wang
Bingyi Kang
Yizeng Han
Shenzhi Wang
Shiji Song
Jiashi Feng
Gao Huang
VLM
299
67
0
04 Nov 2024
Sparsing Law: Towards Large Language Models with Greater Activation Sparsity
Yuqi Luo
Chenyang Song
Xu Han
Yuxiao Chen
Chaojun Xiao
Zhiyuan Liu
Maosong Sun
Jiansheng Wei
Zhiyuan Liu
Maosong Sun
589
14
0
04 Nov 2024
Fast and Memory-Efficient Video Diffusion Using Streamlined Inference
Neural Information Processing Systems (NeurIPS), 2024
Zheng Zhan
Yushu Wu
Yifan Gong
Zichong Meng
Zhenglun Kong
Changdi Yang
Geng Yuan
Pu Zhao
Wei Niu
Yanzhi Wang
VGen
205
14
0
02 Nov 2024
NEO: Saving GPU Memory Crisis with CPU Offloading for Online LLM Inference
Xuanlin Jiang
Yang Zhou
Shiyi Cao
Eric Liang
Minlan Yu
224
26
0
02 Nov 2024
MoE-I
2
^2
2
: Compressing Mixture of Experts Models through Inter-Expert Pruning and Intra-Expert Low-Rank Decomposition
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2024
Cheng Yang
Yang Sui
Jinqi Xiao
Lingyi Huang
Yu Gong
Yuanlin Duan
Wenqi Jia
Miao Yin
Yu Cheng
Bo Yuan
MoE
418
25
0
01 Nov 2024
The Impact of Inference Acceleration on Bias of LLMs
North American Chapter of the Association for Computational Linguistics (NAACL), 2024
Elisabeth Kirsten
Ivan Habernal
Vedant Nanda
Muhammad Bilal Zafar
356
0
0
29 Oct 2024
ProMoE: Fast MoE-based LLM Serving using Proactive Caching
Xiaoniu Song
Zihang Zhong
Rong Chen
Haibo Chen
MoE
488
20
0
29 Oct 2024
BLAST: Block-Level Adaptive Structured Matrices for Efficient Deep Neural Network Inference
Neural Information Processing Systems (NeurIPS), 2024
Changwoo Lee
Soo Min Kwon
Qing Qu
Hun-Seok Kim
285
2
0
28 Oct 2024
LLMCBench: Benchmarking Large Language Model Compression for Efficient Deployment
Neural Information Processing Systems (NeurIPS), 2024
Ge Yang
Changyi He
Jinpei Guo
Jianyu Wu
Yifu Ding
Aishan Liu
Haotong Qin
Pengliang Ji
Xianglong Liu
MQ
279
9
0
28 Oct 2024
EoRA: Fine-tuning-free Compensation for Compressed LLM with Eigenspace Low-Rank Approximation
Shih-yang Liu
Huck Yang
Nai Chit Fung
Charbel Sakr
Hongxu Yin
...
Jan Kautz
Yu-Chun Wang
Pavlo Molchanov
Min-Hung Chen
Min-Hung Chen
MQ
554
0
0
28 Oct 2024
LEGO: Language Model Building Blocks
Shrenik Bhansali
Alwin Jin
Tyler Lizzo
Larry Heck
163
0
0
23 Oct 2024
Multi-Draft Speculative Sampling: Canonical Decomposition and Theoretical Limits
International Conference on Learning Representations (ICLR), 2024
Ashish Khisti
MohammadReza Ebrahimi
Hassan Dbouk
Arash Behboodi
Roland Memisevic
Christos Louizos
333
2
0
23 Oct 2024
Beware of Calibration Data for Pruning Large Language Models
International Conference on Learning Representations (ICLR), 2024
Yixin Ji
Yang Xiang
Juntao Li
Qingrong Xia
Ping Li
Xinyu Duan
Zhefeng Wang
Min Zhang
322
7
0
23 Oct 2024
Self-calibration for Language Model Quantization and Pruning
North American Chapter of the Association for Computational Linguistics (NAACL), 2024
Miles Williams
G. Chrysostomou
Nikolaos Aletras
MQ
1.0K
2
0
22 Oct 2024
Pruning Foundation Models for High Accuracy without Retraining
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2024
Pu Zhao
Fei Sun
Xuan Shen
Pinrui Yu
Zhenglun Kong
Yanzhi Wang
Xue Lin
216
21
0
21 Oct 2024
SDP4Bit: Toward 4-bit Communication Quantization in Sharded Data Parallelism for LLM Training
Neural Information Processing Systems (NeurIPS), 2024
Jinda Jia
Cong Xie
Hanlin Lu
Daoce Wang
Hao Feng
...
Baixi Sun
Yanghua Peng
Zhi-Li Zhang
Xin Liu
Dingwen Tao
MQ
288
10
0
20 Oct 2024
EvoPress: Accurate Dynamic Model Compression via Evolutionary Search
Oliver Sieberling
Denis Kuznedelev
Eldar Kurtic
Dan Alistarh
MQ
416
5
0
18 Oct 2024
GDeR: Safeguarding Efficiency, Balancing, and Robustness via Prototypical Graph Pruning
Neural Information Processing Systems (NeurIPS), 2024
Guibin Zhang
Haonan Dong
Yuchen Zhang
Zhixun Li
Dingshuo Chen
Kai Wang
Tianlong Chen
Yuxuan Liang
Dawei Cheng
Kun Wang
285
5
0
17 Oct 2024
Harnessing Your DRAM and SSD for Sustainable and Accessible LLM Inference with Mixed-Precision and Multi-level Caching
Jie Peng
Zhang Cao
Huaizhi Qu
Zhengyu Zhang
Chang Guo
Yanyong Zhang
Zhichao Cao
Tianlong Chen
304
5
0
17 Oct 2024
On the Role of Attention Heads in Large Language Model Safety
International Conference on Learning Representations (ICLR), 2024
Zhenhong Zhou
Haiyang Yu
Xinghua Zhang
Rongwu Xu
Fei Huang
Kun Wang
Yang Liu
Cunchun Li
Yongbin Li
489
37
0
17 Oct 2024
DAQ: Density-Aware Post-Training Weight-Only Quantization For LLMs
Yingsong Luo
Ling Chen
MQ
247
0
0
16 Oct 2024
FiRST: Finetuning Router-Selective Transformers for Input-Adaptive Latency Reduction
Akriti Jain
Saransh Sharma
Koyel Mukherjee
Soumyabrata Pal
344
0
0
16 Oct 2024
Channel-Wise Mixed-Precision Quantization for Large Language Models
Zihan Chen
Bike Xie
Jundong Li
Cong Shen
MQ
503
6
0
16 Oct 2024
MoE-Pruner: Pruning Mixture-of-Experts Large Language Model using the Hints from Its Router
Yanyue Xie
Zhi Zhang
Ding Zhou
Cong Xie
Ziang Song
Xin Liu
Yanzhi Wang
Xue Lin
An Xu
LLMAG
231
24
0
15 Oct 2024
DISP-LLM: Dimension-Independent Structural Pruning for Large Language Models
Neural Information Processing Systems (NeurIPS), 2024
Shangqian Gao
Chi-Heng Lin
Ting Hua
Tang Zheng
Yilin Shen
Hongxia Jin
Yen-Chang Hsu
243
19
0
15 Oct 2024
LLM2Swarm: Robot Swarms that Responsively Reason, Plan, and Collaborate through LLMs
Volker Strobel
Marco Dorigo
Mario Fritz
LRM
288
12
0
15 Oct 2024
SLaNC: Static LayerNorm Calibration
Mahsa Salmani
Nikita Trukhanov
I. Soloveychik
MQ
244
0
0
14 Oct 2024
AlphaPruning: Using Heavy-Tailed Self Regularization Theory for Improved Layer-wise Pruning of Large Language Models
Neural Information Processing Systems (NeurIPS), 2024
Haiquan Lu
Yefan Zhou
Shiwei Liu
Zhangyang Wang
Michael W. Mahoney
Yaoqing Yang
146
23
0
14 Oct 2024
HSR-Enhanced Sparse Attention Acceleration
Bo Chen
Yingyu Liang
Zhizhou Sha
Zhenmei Shi
Zhao Song
815
24
0
14 Oct 2024
Skipping Computations in Multimodal LLMs
Mustafa Shukor
Matthieu Cord
239
6
0
12 Oct 2024
DeltaDQ: Ultra-High Delta Compression for Fine-Tuned LLMs via Group-wise Dropout and Separate Quantization
Yanfeng Jiang
Zelan Yang
B. Chen
Shen Li
Shen Li
Tao Li
MQ
151
4
0
11 Oct 2024
QEFT: Quantization for Efficient Fine-Tuning of LLMs
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2024
Changhun Lee
Jun-gyu Jin
Jun-gyu Jin
Eunhyeok Park
MQ
214
4
0
11 Oct 2024
Is C4 Dataset Optimal for Pruning? An Investigation of Calibration Data for LLM Pruning
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2024
Abhinav Bandari
L. Yin
Cheng-Yu Hsieh
Ajay Kumar Jaiswal
Tianlong Chen
Li Shen
Ranjay Krishna
Shiwei Liu
193
15
0
09 Oct 2024
Chip-Tuning: Classify Before Language Models Say
Fangwei Zhu
Dian Li
Jiajun Huang
Gang Liu
Hui Wang
Zhifang Sui
216
0
0
09 Oct 2024
Previous
1
2
3
...
6
7
8
...
12
13
14
Next