ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2301.00774
  4. Cited By
SparseGPT: Massive Language Models Can Be Accurately Pruned in One-Shot
v1v2v3 (latest)

SparseGPT: Massive Language Models Can Be Accurately Pruned in One-Shot

International Conference on Machine Learning (ICML), 2023
2 January 2023
Elias Frantar
Dan Alistarh
    VLM
ArXiv (abs)PDFHTMLHuggingFace (3 upvotes)Github (799★)

Papers citing "SparseGPT: Massive Language Models Can Be Accurately Pruned in One-Shot"

50 / 665 papers shown
Smooth Model Compression without Fine-Tuning
Smooth Model Compression without Fine-Tuning
Christina Runkel
Natacha Kuete Meli
Jovita Lukasik
A. Biguri
Carola-Bibiane Schönlieb
Michael Moeller
251
0
0
30 May 2025
DenoiseRotator: Enhance Pruning Robustness for LLMs via Importance Concentration
DenoiseRotator: Enhance Pruning Robustness for LLMs via Importance Concentration
Tianteng Gu
Bei Liu
Bo Xiao
Ke Zeng
Jiacheng Liu
Y. Qian
205
1
0
29 May 2025
Leave it to the Specialist: Repair Sparse LLMs with Sparse Fine-Tuning via Sparsity Evolution
Leave it to the Specialist: Repair Sparse LLMs with Sparse Fine-Tuning via Sparsity Evolution
Q. Xiao
Alan Ansell
Boqian Wu
Lu Yin
Mykola Pechenizkiy
Shiwei Liu
Decebal Constantin Mocanu
278
2
0
29 May 2025
TSENOR: Highly-Efficient Algorithm for Finding Transposable N:M Sparse Masks
TSENOR: Highly-Efficient Algorithm for Finding Transposable N:M Sparse Masks
X. Meng
Mehdi Makni
Rahul Mazumder
204
0
0
29 May 2025
SlimLLM: Accurate Structured Pruning for Large Language Models
SlimLLM: Accurate Structured Pruning for Large Language Models
Jialong Guo
Xinghao Chen
Yehui Tang
Yunhe Wang
163
2
0
28 May 2025
ACE: Exploring Activation Cosine Similarity and Variance for Accurate and Calibration-Efficient LLM Pruning
ACE: Exploring Activation Cosine Similarity and Variance for Accurate and Calibration-Efficient LLM Pruning
Zhendong Mi
Zhenglun Kong
Geng Yuan
Shaoyi Huang
239
2
0
28 May 2025
M-Wanda: Improving One-Shot Pruning for Multilingual LLMs
M-Wanda: Improving One-Shot Pruning for Multilingual LLMs
Rochelle Choenni
Ivan Titov
222
1
0
27 May 2025
DLP: Dynamic Layerwise Pruning in Large Language Models
DLP: Dynamic Layerwise Pruning in Large Language Models
Yuli Chen
B. Cheng
Jiale Han
Yingying Zhang
Yingting Li
Shuhao Zhang
262
1
0
27 May 2025
LayerIF: Estimating Layer Quality for Large Language Models using Influence Functions
LayerIF: Estimating Layer Quality for Large Language Models using Influence Functions
Hadi Askari
Shivanshu Gupta
Fei Wang
Anshuman Chhabra
Muhao Chen
TDI
412
4
0
27 May 2025
TuneComp: Joint Fine-tuning and Compression for Large Foundation Models
TuneComp: Joint Fine-tuning and Compression for Large Foundation Models
Xiangyu Chen
Jing Liu
Ye Wang
Matthew Brand
Wang
T. Koike-Akino
254
0
0
27 May 2025
ResSVD: Residual Compensated SVD for Large Language Model Compression
ResSVD: Residual Compensated SVD for Large Language Model Compression
Haolei Bai
Siyong Jian
Tuo Liang
Yu Yin
Huan Wang
342
3
0
26 May 2025
WINA: Weight Informed Neuron Activation for Accelerating Large Language Model Inference
WINA: Weight Informed Neuron Activation for Accelerating Large Language Model Inference
Sihan Chen
Dan Zhao
Jongwoo Ko
Colby R. Banbury
Huiping Zhuang
Luming Liang
Tianyi Chen
189
0
0
26 May 2025
Pangu Light: Weight Re-Initialization for Pruning and Accelerating LLMs
Pangu Light: Weight Re-Initialization for Pruning and Accelerating LLMs
Hanting Chen
Jiarui Qin
Jialong Guo
Tao Yuan
Yichun Yin
...
Can Chen
Xinghao Chen
Fisher Yu
Ruiming Tang
Yunhe Wang
265
2
0
26 May 2025
Can Compressed LLMs Truly Act? An Empirical Evaluation of Agentic Capabilities in LLM Compression
Can Compressed LLMs Truly Act? An Empirical Evaluation of Agentic Capabilities in LLM Compression
Peijie Dong
Zhenheng Tang
Xiang Liu
Lujun Li
Xiaowen Chu
Bo Li
450
7
0
26 May 2025
$μ$-MoE: Test-Time Pruning as Micro-Grained Mixture-of-Experts
μμμ-MoE: Test-Time Pruning as Micro-Grained Mixture-of-Experts
T. Koike-Akino
Jing Liu
Ye Wang
MoE
219
0
0
24 May 2025
Generalized Fisher-Weighted SVD: Scalable Kronecker-Factored Fisher Approximation for Compressing Large Language Models
Generalized Fisher-Weighted SVD: Scalable Kronecker-Factored Fisher Approximation for Compressing Large Language Models
Viktoriia Chekalina
Daniil Moskovskiy
Daria Cherniuk
Maxim Kurkin
Andrey Kuznetsov
Evgeny Frolov
416
1
0
23 May 2025
How Many Parameters Does Your Task Really Need? Task Specific Pruning with LLM-Sieve
How Many Parameters Does Your Task Really Need? Task Specific Pruning with LLM-Sieve
Waleed Reda
Abhinav Jangda
Krishna Chintalapudi
297
0
0
23 May 2025
LatentLLM: Attention-Aware Joint Tensor Compression
LatentLLM: Attention-Aware Joint Tensor Compression
T. Koike-Akino
Xiangyu Chen
Jing Liu
Ye Wang
Wang
Matthew Brand
231
3
0
23 May 2025
Two-Stage Regularization-Based Structured Pruning for LLMs
Two-Stage Regularization-Based Structured Pruning for LLMs
Mingkuan Feng
Jinyang Wu
Siyuan Liu
Shuai Zhang
Hongjian Fang
Ruihan Jin
Feihu Che
Pengpeng Shao
Zhengqi Wen
370
0
0
23 May 2025
Only Large Weights (And Not Skip Connections) Can Prevent the Perils of Rank Collapse
Only Large Weights (And Not Skip Connections) Can Prevent the Perils of Rank Collapse
Josh Alman
Zhao Song
368
10
0
22 May 2025
LLM-Powered AI Agent Systems and Their Applications in Industry
LLM-Powered AI Agent Systems and Their Applications in Industry
Guannan Liang
Qianqian Tong
LLMAGLM&Ro
328
13
0
22 May 2025
KNN-SSD: Enabling Dynamic Self-Speculative Decoding via Nearest Neighbor Layer Set Optimization
KNN-SSD: Enabling Dynamic Self-Speculative Decoding via Nearest Neighbor Layer Set Optimization
Mingbo Song
Heming Xia
Jun Zhang
Chak Tou Leong
Qiancheng Xu
Wenjie Li
Sujian Li
190
1
0
22 May 2025
TRIM: Achieving Extreme Sparsity with Targeted Row-wise Iterative Metric-driven Pruning
TRIM: Achieving Extreme Sparsity with Targeted Row-wise Iterative Metric-driven Pruning
Florentin Beck
William Rudman
Carsten Eickhoff
379
1
0
22 May 2025
Hierarchical Safety Realignment: Lightweight Restoration of Safety in Pruned Large Vision-Language Models
Hierarchical Safety Realignment: Lightweight Restoration of Safety in Pruned Large Vision-Language ModelsAnnual Meeting of the Association for Computational Linguistics (ACL), 2025
Yue Li
Xin Yi
Dongsheng Shi
Gerard de Melo
Xiaoling Wang
Linlin Wang
346
0
0
22 May 2025
Improved Methods for Model Pruning and Knowledge Distillation
Improved Methods for Model Pruning and Knowledge Distillation
Wei Jiang
Anying Fu
Youling Zhang
VLM
39
0
0
20 May 2025
One-for-All Pruning: A Universal Model for Customized Compression of Large Language Models
One-for-All Pruning: A Universal Model for Customized Compression of Large Language ModelsAnnual Meeting of the Association for Computational Linguistics (ACL), 2025
Rongguang Ye
Ming Tang
283
1
0
18 May 2025
Fast RoPE Attention: Combining the Polynomial Method and Fast Fourier Transform
Fast RoPE Attention: Combining the Polynomial Method and Fast Fourier Transform
Josh Alman
Zhao Song
349
23
0
17 May 2025
Safe Delta: Consistently Preserving Safety when Fine-Tuning LLMs on Diverse Datasets
Safe Delta: Consistently Preserving Safety when Fine-Tuning LLMs on Diverse Datasets
Ning Lu
Shengcai Liu
Jiahao Wu
Weiyu Chen
Zhirui Zhang
Yew-Soon Ong
Qi Wang
Ke Tang
334
12
0
17 May 2025
Addition is almost all you need: Compressing neural networks with double binary factorization
Addition is almost all you need: Compressing neural networks with double binary factorization
Vladimír Boža
Vladimír Macko
MQ
511
2
0
16 May 2025
Accurate KV Cache Quantization with Outlier Tokens Tracing
Accurate KV Cache Quantization with Outlier Tokens TracingAnnual Meeting of the Association for Computational Linguistics (ACL), 2025
Yi Su
Yuechi Zhou
Quantong Qiu
Jilong Li
Qingrong Xia
Ping Li
Xinyu Duan
Zhefeng Wang
Min Zhang
MQ
336
8
0
16 May 2025
Semantic Retention and Extreme Compression in LLMs: Can We Have Both?
Semantic Retention and Extreme Compression in LLMs: Can We Have Both?
Stanislas Laborde
Martin Cousseau
Antoun Yaacoub
Lionel Prevost
MQ
269
1
0
12 May 2025
FloE: On-the-Fly MoE Inference on Memory-constrained GPU
FloE: On-the-Fly MoE Inference on Memory-constrained GPU
Yuxin Zhou
Zheng Li
Junxuan Zhang
Jue Wang
Yanjie Wang
Zhongle Xie
Ke Chen
Lidan Shou
MoE
443
3
0
09 May 2025
Scalable LLM Math Reasoning Acceleration with Low-rank Distillation
Scalable LLM Math Reasoning Acceleration with Low-rank Distillation
Harry Dong
Bilge Acun
Beidi Chen
Yuejie Chi
LRM
300
4
0
08 May 2025
Onboard Optimization and Learning: A Survey
Onboard Optimization and Learning: A Survey
Monirul Islam Pavel
Siyi Hu
Mahardhika Pratama
Ryszard Kowalczyk
357
1
0
07 May 2025
Faster MoE LLM Inference for Extremely Large Models
Faster MoE LLM Inference for Extremely Large Models
Haoqi Yang
Luohe Shi
Qiwei Li
Zuchao Li
Ping Wang
Bo Du
Mengjia Shen
Hai Zhao
MoE
246
3
0
06 May 2025
SPAP: Structured Pruning via Alternating Optimization and Penalty Methods
SPAP: Structured Pruning via Alternating Optimization and Penalty Methods
Hanyu Hu
Xiaoming Yuan
218
1
0
06 May 2025
ReplaceMe: Network Simplification via Depth Pruning and Transformer Block Linearization
ReplaceMe: Network Simplification via Depth Pruning and Transformer Block Linearization
Dmitriy Shopkhoev
Ammar Ali
Magauiya Zhussip
Valentin Malykh
Stamatios Lefkimmiatis
N. Komodakis
Sergey Zagoruyko
VLM
1.1K
0
0
05 May 2025
Efficient Shapley Value-based Non-Uniform Pruning of Large Language Models
Efficient Shapley Value-based Non-Uniform Pruning of Large Language Models
Chuan Sun
Han Yu
Lizhen Cui
Xiaoxiao Li
1.0K
5
0
03 May 2025
Position: Enough of Scaling LLMs! Lets Focus on Downscaling
Position: Enough of Scaling LLMs! Lets Focus on Downscaling
Ayan Sengupta
Ayan Sengupta
Tanmoy Chakraborty
404
1
0
02 May 2025
Efficient LLMs with AMP: Attention Heads and MLP Pruning
Efficient LLMs with AMP: Attention Heads and MLP Pruning
Leandro Giusti Mugnaini
Bruno Yamamoto
Lucas Lauton de Alcantara
Victor Zacarias
Edson Bollis
Lucas Pellicer
A. H. R. Costa
Artur Jordao
267
2
0
29 Apr 2025
Legilimens: Performant Video Analytics on the System-on-Chip Edge
Legilimens: Performant Video Analytics on the System-on-Chip Edge
M. Ramanujam
Yinwei Dai
Kyle Jamieson
Ravi Netravali
237
0
0
29 Apr 2025
BrAIcht, a theatrical agent that speaks like Bertolt Brecht's characters
BrAIcht, a theatrical agent that speaks like Bertolt Brecht's characters
Baz Roland
Kristina Malyseva
Anna Pappa
Tristan Cazenave
272
0
0
29 Apr 2025
R-Sparse: Rank-Aware Activation Sparsity for Efficient LLM Inference
R-Sparse: Rank-Aware Activation Sparsity for Efficient LLM InferenceInternational Conference on Learning Representations (ICLR), 2025
Zhenyu Zhang
Zechun Liu
Yuandong Tian
Harshit Khaitan
Liang Luo
Steven Li
248
10
0
28 Apr 2025
L3: DIMM-PIM Integrated Architecture and Coordination for Scalable Long-Context LLM Inference
L3: DIMM-PIM Integrated Architecture and Coordination for Scalable Long-Context LLM Inference
Qingyuan Liu
Liyan Chen
Yanning Yang
Haoyu Wang
Dong Du
Zhigang Mao
Naifeng Jing
Yubin Xia
Haibo Chen
214
0
0
24 Apr 2025
The Rise of Small Language Models in Healthcare: A Comprehensive Survey
The Rise of Small Language Models in Healthcare: A Comprehensive Survey
Muskan Garg
Shaina Raza
Shebuti Rayana
Xingyi Liu
Sunghwan Sohn
LM&MAAILaw
486
13
0
23 Apr 2025
ConTextual: Improving Clinical Text Summarization in LLMs with Context-preserving Token Filtering and Knowledge Graphs
ConTextual: Improving Clinical Text Summarization in LLMs with Context-preserving Token Filtering and Knowledge Graphs
Fahmida Liza Piya
Rahmatollah Beheshti
638
2
0
23 Apr 2025
NoWag: A Unified Framework for Shape Preserving Compression of Large Language Models
NoWag: A Unified Framework for Shape Preserving Compression of Large Language Models
Lawrence Liu
Inesh Chakrabarti
Yixiao Li
Mengdi Wang
Tuo Zhao
Lin F. Yang
MQ
478
1
0
20 Apr 2025
Accelerating LLM Inference with Flexible N:M Sparsity via A Fully Digital Compute-in-Memory Accelerator
Accelerating LLM Inference with Flexible N:M Sparsity via A Fully Digital Compute-in-Memory Accelerator
Akshat Ramachandran
Souvik Kundu
Arnab Raha
Shamik Kundu
Deepak K. Mathaikutty
Tushar Krishna
338
4
0
19 Apr 2025
From Large to Super-Tiny: End-to-End Optimization for Cost-Efficient LLMs
From Large to Super-Tiny: End-to-End Optimization for Cost-Efficient LLMs
Jiliang Ni
Jiachen Pu
Zhongyi Yang
Kun Zhou
Hui Wang
Xiaoliang Xiao
Dakui Wang
Xin Li
Jingfeng Luo
Conggang Hu
493
1
0
18 Apr 2025
Sign-In to the Lottery: Reparameterizing Sparse Training From Scratch
Sign-In to the Lottery: Reparameterizing Sparse Training From Scratch
Advait Gadhikar
Tom Jacobs
Chao Zhou
R. Burkholz
392
1
0
17 Apr 2025
Previous
12345...121314
Next