Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2301.00774
Cited By
v1
v2
v3 (latest)
SparseGPT: Massive Language Models Can Be Accurately Pruned in One-Shot
International Conference on Machine Learning (ICML), 2023
2 January 2023
Elias Frantar
Dan Alistarh
VLM
Re-assign community
ArXiv (abs)
PDF
HTML
HuggingFace (3 upvotes)
Github (799★)
Papers citing
"SparseGPT: Massive Language Models Can Be Accurately Pruned in One-Shot"
50 / 665 papers shown
Systematic Weight Evaluation for Pruning Large Language Models: Enhancing Performance and Sustainability
Ashhadul Islam
S. Belhaouari
Amine Bermak
232
0
0
24 Feb 2025
When Compression Meets Model Compression: Memory-Efficient Double Compression for Large Language Models
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2025
Weilan Wang
Yu Mao
Dongdong Tang
Hongchao Du
Nan Guan
Chun Jason Xue
MQ
329
4
0
24 Feb 2025
The Lottery LLM Hypothesis, Rethinking What Abilities Should LLM Compression Preserve?
Zhenheng Tang
Xiang Liu
Qian Wang
Peijie Dong
Bingsheng He
Xiaowen Chu
Bo Li
LRM
298
10
0
24 Feb 2025
Probe Pruning: Accelerating LLMs through Dynamic Pruning via Model-Probing
International Conference on Learning Representations (ICLR), 2025
Qi Le
Enmao Diao
Ziyan Wang
Xinran Wang
Jie Ding
Li Yang
Ali Anwar
334
8
0
24 Feb 2025
LED-Merging: Mitigating Safety-Utility Conflicts in Model Merging with Location-Election-Disjoint
Annual Meeting of the Association for Computational Linguistics (ACL), 2025
Qianli Ma
Dongrui Liu
Qian Chen
Linfeng Zhang
Jing Shao
MoMe
976
4
0
24 Feb 2025
Delta Decompression for MoE-based LLMs Compression
Hao Gu
Wei Li
Lujun Li
Qiyuan Zhu
Mark Lee
Shengjie Sun
Wei Xue
Wenhan Luo
MoE
345
18
0
24 Feb 2025
Automatic Joint Structured Pruning and Quantization for Efficient Neural Network Training and Compression
Computer Vision and Pattern Recognition (CVPR), 2025
Xiaoyi Qu
David Aponte
Colby R. Banbury
Daniel P. Robinson
Tianyu Ding
K. Koishida
Ilya Zharkov
Tianyi Chen
MQ
317
5
0
23 Feb 2025
Dynamic Low-Rank Sparse Adaptation for Large Language Models
International Conference on Learning Representations (ICLR), 2025
Weizhong Huang
Yuxin Zhang
Xiawu Zheng
Wenshu Fan
Aiyue Chen
Yiwu Yao
Rongrong Ji
450
5
0
21 Feb 2025
PPC-GPT: Federated Task-Specific Compression of Large Language Models via Pruning and Chain-of-Thought Distillation
Tao Fan
Guoqiang Ma
Yuanfeng Song
Lixin Fan
Kai Chen
237
2
0
21 Feb 2025
EvoP: Robust LLM Inference via Evolutionary Pruning
Shangyu Wu
Hongchao Du
Ying Xiong
Shuai Chen
Tei-Wei Kuo
Nan Guan
Chun Jason Xue
627
3
0
19 Feb 2025
MaskPrune: Mask-based LLM Pruning for Layer-wise Uniform Structures
Jiayu Qin
Jianchao Tan
Jianchao Tan
Xunliang Cai
Wei Wang
204
0
0
19 Feb 2025
PTQ1.61: Push the Real Limit of Extremely Low-Bit Post-Training Quantization Methods for Large Language Models
Annual Meeting of the Association for Computational Linguistics (ACL), 2025
Jiaqi Zhao
Miao Zhang
Ming Wang
Yuzhang Shang
Kaihao Zhang
Weili Guan
Yaowei Wang
Min Zhang
MQ
342
2
0
18 Feb 2025
DSMoE: Matrix-Partitioned Experts with Dynamic Routing for Computation-Efficient Dense LLMs
Minxuan Lv
Zhenpeng Su
Leiyu Pan
Yizhe Xiong
Zijia Lin
...
Guiguang Ding
Cheng Luo
Di Zhang
Kun Gai
Songlin Hu
MoE
391
1
0
18 Feb 2025
Benchmarking Post-Training Quantization in LLMs: Comprehensive Taxonomy, Unified Evaluation, and Comparative Analysis
Jiaqi Zhao
Ming Wang
Miao Zhang
Yuzhang Shang
Xuebo Liu
Yaowei Wang
Min Zhang
Liqiang Nie
MQ
601
6
0
18 Feb 2025
PASER: Post-Training Data Selection for Efficient Pruned Large Language Model Recovery
Bowei He
Lihao Yin
Hui-Ling Zhen
Xiaokun Zhang
Mingxuan Yuan
Chen Ma
395
2
0
18 Feb 2025
Signal Collapse in One-Shot Pruning: When Sparse Models Fail to Distinguish Neural Representations
Dhananjay Saikumar
Blesson Varghese
209
0
0
18 Feb 2025
An Efficient Sparse Fine-Tuning with Low Quantization Error via Neural Network Pruning
Cen-Jhih Li
Aditya Bhaskara
391
0
0
17 Feb 2025
MaZO: Masked Zeroth-Order Optimization for Multi-Task Fine-Tuning of Large Language Models
Zhen Zhang
Yue Yang
Kai Zhen
Nathan Susanj
Athanasios Mouchtaris
Siegfried Kunzmann
Zheng Zhang
370
2
0
17 Feb 2025
EfficientLLM: Scalable Pruning-Aware Pretraining for Architecture-Agnostic Edge Language Models
Xingrun Xing
Zheng Liu
Shitao Xiao
Boyan Gao
Yiming Liang
Wanpeng Zhang
Haokun Lin
Guoqi Li
Jiajun Zhang
LRM
615
8
0
10 Feb 2025
Identify Critical KV Cache in LLM Inference from an Output Perturbation Perspective
Yuan Feng
Junlin Lv
Yuhang Cao
Xike Xie
S.Kevin Zhou
277
9
0
06 Feb 2025
M2R2: Mixture of Multi-Rate Residuals for Efficient Transformer Inference
Nikhil Bhendawade
Mahyar Najibi
Devang Naik
Irina Belousova
MoE
448
1
0
04 Feb 2025
Choose Your Model Size: Any Compression of Large Language Models Without Re-Computation
Martin Genzel
Patrick Putzky
Pengfei Zhao
Siyang Song
Mattes Mollenhauer
Robert Seidel
Stefan Dietzel
Thomas Wollmann
249
0
0
03 Feb 2025
Progressive Binarization with Semi-Structured Pruning for LLMs
Xinyu Yan
Tianao Zhang
Zhiteng Li
Yulun Zhang
Yulun Zhang
MQ
561
4
0
03 Feb 2025
HASSLE-free: A unified Framework for Sparse plus Low-Rank Matrix Decomposition for LLMs
Mehdi Makni
Kayhan Behdin
Zheng Xu
Natalia Ponomareva
Rahul Mazumder
120
1
0
02 Feb 2025
Symmetric Pruning of Large Language Models
Kai Yi
Peter Richtárik
AAML
VLM
320
2
0
31 Jan 2025
Brain network science modelling of sparse neural networks enables Transformers and LLMs to perform as fully connected
Yingtao Zhang
Diego Cerretti
Jialin Zhao
Wenjing Wu
Ziheng Liao
Umberto Michieli
C. Cannistraci
584
1
0
31 Jan 2025
Merino: Entropy-driven Design for Generative Language Models on IoT Devices
AAAI Conference on Artificial Intelligence (AAAI), 2024
Youpeng Zhao
Ming Lin
Huadong Tang
Qiang Wu
Jun Wang
373
1
0
28 Jan 2025
GUIDE: A Global Unified Inference Engine for Deploying Large Language Models in Heterogeneous Environments
Yanyu Chen
Ganhong Huang
276
0
0
28 Jan 2025
SLoPe: Double-Pruned Sparse Plus Lazy Low-Rank Adapter Pretraining of LLMs
International Conference on Learning Representations (ICLR), 2024
Mohammad Mozaffari
Amir Yazdanbakhsh
Zhao Zhang
M. Dehnavi
379
13
0
28 Jan 2025
You Only Prune Once: Designing Calibration-Free Model Compression With Policy Learning
International Conference on Learning Representations (ICLR), 2025
Ayan Sengupta
Siddhant Chaudhary
Tanmoy Chakraborty
336
9
0
25 Jan 2025
Optimization Strategies for Enhancing Resource Efficiency in Transformers & Large Language Models
International Conference on Performance Engineering (ICPE), 2025
Tom Wallace
Naser Ezzati-Jivan
Beatrice Ombuki-Berman
MQ
232
2
0
16 Jan 2025
DiscQuant: A Quantization Method for Neural Networks Inspired by Discrepancy Theory
Annual Conference Computational Learning Theory (COLT), 2025
Jerry Chee
A. Backurs
Rainie Heck
Li Zhang
Janardhan Kulkarni
Thomas Rothvoss
Sivakanth Gopi
MQ
291
1
0
11 Jan 2025
Deriving Coding-Specific Sub-Models from LLMs using Resource-Efficient Pruning
Laura Puccioni
Alireza Farshin
Mariano Scazzariello
Changjie Wang
Marco Chiesa
Dejan Kostic
209
0
0
10 Jan 2025
Tailored-LLaMA: Optimizing Few-Shot Learning in Pruned LLaMA Models with Task-Specific Prompts
European Conference on Artificial Intelligence (ECAI), 2024
Danyal Aftab
Steven Davy
ALM
270
3
0
10 Jan 2025
iServe: An Intent-based Serving System for LLMs
Dimitrios Liakopoulos
Tianrui Hu
Prasoon Sinha
N. Yadwadkar
VLM
1.0K
1
0
08 Jan 2025
The Efficiency vs. Accuracy Trade-off: Optimizing RAG-Enhanced LLM Recommender Systems Using Multi-Head Early Exit
Annual Meeting of the Association for Computational Linguistics (ACL), 2025
Huixue Zhou
Hengrui Gu
Xi Liu
Kaixiong Zhou
Mingfu Liang
...
Wen-Yen Chen
Yiping Han
Bo Long
Rui Zhang
Tianlong Chen
3DV
185
4
0
04 Jan 2025
Lillama: Large Language Models Compression via Low-Rank Feature Distillation
Yaya Sy
Christophe Cerisara
Irina Illina
MQ
302
0
0
31 Dec 2024
MaskGaussian: Adaptive 3D Gaussian Representation from Probabilistic Masks
Computer Vision and Pattern Recognition (CVPR), 2024
Yifei Liu
Zhihang Zhong
Yifan Zhan
Sheng Xu
Xiao Sun
3DGS
482
16
0
29 Dec 2024
DecDEC: A Systems Approach to Advancing Low-Bit LLM Quantization
USENIX Symposium on Operating Systems Design and Implementation (OSDI), 2024
Y. Park
Jake Hyun
Hojoon Kim
Jae W. Lee
MQ
445
0
0
28 Dec 2024
SlimGPT: Layer-wise Structured Pruning for Large Language Models
Neural Information Processing Systems (NeurIPS), 2024
Gui Ling
Ziyang Wang
Yuliang Yan
Qingwen Liu
207
27
0
24 Dec 2024
LSAQ: Layer-Specific Adaptive Quantization for Large Language Model Deployment
Binrui Zeng
Shezheng Song
Xiaodong Liu
Jie Yu
Shan Zhao
Jun Ma
Xiaopeng Li
Shasha Li
Xinran Hong
Yongtao Tang
MQ
304
1
0
24 Dec 2024
GQSA: Group Quantization and Sparsity for Accelerating Large Language Model Inference
Chao Zeng
Songwei Liu
Shu Yang
Fangmin Chen
Lean Fu
Xing Mei
MQ
423
3
0
23 Dec 2024
HyperCLIP: Adapting Vision-Language models with Hypernetworks
Victor Akinwande
Mohammad Sadegh Norouzzadeh
Devin Willmott
Anna Bair
Madan Ravi Ganesh
J. Zico Kolter
CLIP
VLM
317
2
0
21 Dec 2024
Extracting Interpretable Task-Specific Circuits from Large Language Models for Faster Inference
AAAI Conference on Artificial Intelligence (AAAI), 2024
Jorge García-Carrasco
A. Maté
Juan Trujillo
277
3
0
20 Dec 2024
FineGates: LLMs Finetuning with Compression using Stochastic Gates
Jonathan Svirsky
Yehonathan Refael
Ofir Lindenbaum
282
3
0
17 Dec 2024
C3oT: Generating Shorter Chain-of-Thought without Compromising Effectiveness
AAAI Conference on Artificial Intelligence (AAAI), 2024
Yu Kang
Xianghui Sun
Liangyu Chen
Wei Zou
LRM
460
112
0
16 Dec 2024
QPruner: Probabilistic Decision Quantization for Structured Pruning in Large Language Models
North American Chapter of the Association for Computational Linguistics (NAACL), 2024
Changhai Zhou
Yuhua Zhou
Shijie Han
Qian Qiao
Hongguang Li
MQ
212
0
0
16 Dec 2024
TrimLLM: Progressive Layer Dropping for Domain-Specific LLMs
Lanxiang Hu
Tajana Rosing
Hao Zhang
244
2
0
15 Dec 2024
DiffKV: Differentiated Memory Management for Large Language Models with Parallel KV Compaction
Symposium on Operating Systems Principles (SOSP), 2024
Yanqi Zhang
Yuwei Hu
Runyuan Zhao
John C. S. Lui
Haibo Chen
MQ
724
9
0
04 Dec 2024
CPTQuant -- A Novel Mixed Precision Post-Training Quantization Techniques for Large Language Models
Amitash Nanda
Sree Bhargavi Balija
D. Sahoo
MQ
264
4
0
03 Dec 2024
Previous
1
2
3
...
5
6
7
...
12
13
14
Next