v1v2v3 (latest)

SparseGPT: Massive Language Models Can Be Accurately Pruned in One-Shot

International Conference on Machine Learning (ICML), 2023

2 January 2023

Elias Frantar

Dan Alistarh

VLM

ArXiv (abs)PDF HTML HuggingFace (3 upvotes)Github (799★)

Papers citing "SparseGPT: Massive Language Models Can Be Accurately Pruned in One-Shot"

50 / 665 papers shown

Smooth Model Compression without Fine-Tuning

Carola-Bibiane Schönlieb

Michael Moeller

251

30 May 2025

DenoiseRotator: Enhance Pruning Robustness for LLMs via Importance Concentration

205

29 May 2025

Leave it to the Specialist: Repair Sparse LLMs with Sparse Fine-Tuning via Sparsity Evolution

Decebal Constantin Mocanu

278

29 May 2025

TSENOR: Highly-Efficient Algorithm for Finding Transposable N:M Sparse Masks

X. Meng

Mehdi Makni

Rahul Mazumder

204

29 May 2025

SlimLLM: Accurate Structured Pruning for Large Language Models

163

28 May 2025

ACE: Exploring Activation Cosine Similarity and Variance for Accurate and Calibration-Efficient LLM Pruning

239

28 May 2025

M-Wanda: Improving One-Shot Pruning for Multilingual LLMs

Rochelle Choenni

Ivan Titov

222

27 May 2025

DLP: Dynamic Layerwise Pruning in Large Language Models

262

27 May 2025

LayerIF: Estimating Layer Quality for Large Language Models using Influence Functions

412

27 May 2025

TuneComp: Joint Fine-tuning and Compression for Large Foundation Models

254

27 May 2025

ResSVD: Residual Compensated SVD for Large Language Model Compression

342

26 May 2025

WINA: Weight Informed Neuron Activation for Accelerating Large Language Model Inference

189

26 May 2025

Pangu Light: Weight Re-Initialization for Pruning and Accelerating LLMs

...

265

26 May 2025

Can Compressed LLMs Truly Act? An Empirical Evaluation of Agentic Capabilities in LLM Compression

450

26 May 2025

μ

-MoE: Test-Time Pruning as Micro-Grained Mixture-of-Experts

219

24 May 2025

Generalized Fisher-Weighted SVD: Scalable Kronecker-Factored Fisher Approximation for Compressing Large Language Models

416

23 May 2025

How Many Parameters Does Your Task Really Need? Task Specific Pruning with LLM-Sieve

Waleed Reda

Abhinav Jangda

Krishna Chintalapudi

297

23 May 2025

LatentLLM: Attention-Aware Joint Tensor Compression

231

23 May 2025

Two-Stage Regularization-Based Structured Pruning for LLMs

370

23 May 2025

Only Large Weights (And Not Skip Connections) Can Prevent the Perils of Rank Collapse

Josh Alman

Zhao Song

368

22 May 2025

LLM-Powered AI Agent Systems and Their Applications in Industry

Guannan Liang

Qianqian Tong

LLMAG LM&Ro

328

22 May 2025

KNN-SSD: Enabling Dynamic Self-Speculative Decoding via Nearest Neighbor Layer Set Optimization

190

22 May 2025

TRIM: Achieving Extreme Sparsity with Targeted Row-wise Iterative Metric-driven Pruning

Florentin Beck

William Rudman

Carsten Eickhoff

379

22 May 2025

Hierarchical Safety Realignment: Lightweight Restoration of Safety in Pruned Large Vision-Language ModelsAnnual Meeting of the Association for Computational Linguistics (ACL), 2025

346

22 May 2025

Improved Methods for Model Pruning and Knowledge Distillation

20 May 2025

One-for-All Pruning: A Universal Model for Customized Compression of Large Language ModelsAnnual Meeting of the Association for Computational Linguistics (ACL), 2025

Rongguang Ye

Ming Tang

283

18 May 2025

Fast RoPE Attention: Combining the Polynomial Method and Fast Fourier Transform

Josh Alman

Zhao Song

349

17 May 2025

Safe Delta: Consistently Preserving Safety when Fine-Tuning LLMs on Diverse Datasets

334

17 May 2025

Addition is almost all you need: Compressing neural networks with double binary factorization

Vladimír Boža

Vladimír Macko

511

16 May 2025

Accurate KV Cache Quantization with Outlier Tokens TracingAnnual Meeting of the Association for Computational Linguistics (ACL), 2025

336

16 May 2025

Semantic Retention and Extreme Compression in LLMs: Can We Have Both?

269

12 May 2025

FloE: On-the-Fly MoE Inference on Memory-constrained GPU

443

09 May 2025

Scalable LLM Math Reasoning Acceleration with Low-rank Distillation

300

08 May 2025

Onboard Optimization and Learning: A Survey

357

07 May 2025

Faster MoE LLM Inference for Extremely Large Models

246

06 May 2025

SPAP: Structured Pruning via Alternating Optimization and Penalty Methods

Hanyu Hu

Xiaoming Yuan

218

06 May 2025

ReplaceMe: Network Simplification via Depth Pruning and Transformer Block Linearization

Stamatios Lefkimmiatis

N. Komodakis

Sergey Zagoruyko

VLM

1.1K

05 May 2025

Efficient Shapley Value-based Non-Uniform Pruning of Large Language Models

1.0K

03 May 2025

Position: Enough of Scaling LLMs! Lets Focus on Downscaling

Ayan Sengupta

Tanmoy Chakraborty

404

02 May 2025

Efficient LLMs with AMP: Attention Heads and MLP Pruning

Leandro Giusti Mugnaini

Bruno Yamamoto

Lucas Lauton de Alcantara

267

29 Apr 2025

Legilimens: Performant Video Analytics on the System-on-Chip Edge

237

29 Apr 2025

BrAIcht, a theatrical agent that speaks like Bertolt Brecht's characters

272

29 Apr 2025

R-Sparse: Rank-Aware Activation Sparsity for Efficient LLM InferenceInternational Conference on Learning Representations (ICLR), 2025

248

28 Apr 2025

L3: DIMM-PIM Integrated Architecture and Coordination for Scalable Long-Context LLM Inference

214

24 Apr 2025

The Rise of Small Language Models in Healthcare: A Comprehensive Survey

486

23 Apr 2025

ConTextual: Improving Clinical Text Summarization in LLMs with Context-preserving Token Filtering and Knowledge Graphs

Fahmida Liza Piya

Rahmatollah Beheshti

638

23 Apr 2025

NoWag: A Unified Framework for Shape Preserving Compression of Large Language Models

478

20 Apr 2025

Accelerating LLM Inference with Flexible N:M Sparsity via A Fully Digital Compute-in-Memory Accelerator

Deepak K. Mathaikutty

Tushar Krishna

338

19 Apr 2025

From Large to Super-Tiny: End-to-End Optimization for Cost-Efficient LLMs

493

18 Apr 2025

Sign-In to the Lottery: Reparameterizing Sparse Training From Scratch

392

17 Apr 2025