v1v2v3 (latest)

SparseGPT: Massive Language Models Can Be Accurately Pruned in One-Shot

International Conference on Machine Learning (ICML), 2023

2 January 2023

Elias Frantar

Dan Alistarh

VLM

ArXiv (abs)PDF HTML HuggingFace (3 upvotes)Github (799★)

Papers citing "SparseGPT: Massive Language Models Can Be Accurately Pruned in One-Shot"

50 / 665 papers shown

Collaborative Learning of On-Device Small Model and Cloud-Based Large Model: Advances and Future Directions

331

17 Apr 2025

70% Size, 100% Accuracy: Lossless LLM Compression for Efficient GPU Inference via Dynamic-Length Float (DFloat11)

Anshumali Shrivastava

280

15 Apr 2025

Understanding and Optimizing Multi-Stage AI Inference Pipelines

986

14 Apr 2025

TAMP: Token-Adaptive Layerwise Pruning in Multimodal Large Language ModelsAnnual Meeting of the Association for Computational Linguistics (ACL), 2025

368

14 Apr 2025

HELIOS: Adaptive Model And Early-Exit Selection for Efficient LLM Inference Serving

460

14 Apr 2025

Scaling Up On-Device LLMs via Active-Weight Swapping Between DRAM and Flash

743

11 Apr 2025

SpecEE: Accelerating Large Language Model Inference with Speculative Early ExitingInternational Symposium on Computer Architecture (ISCA), 2025

298

11 Apr 2025

^2

: Self-Distilled Sparse Drafters

Mike Lasby

Nish Sinnadurai

Valavan Manohararajah

Sean Lie

Yani Andrew Ioannou

Vithursan Thangarasa

785

10 Apr 2025

Quantization Hurts Reasoning? An Empirical Study on Quantized Reasoning Models

426

07 Apr 2025

AccLLM: Accelerating Long-Context LLM Inference Via Algorithm-Hardware Co-Design

247

07 Apr 2025

Saliency-driven Dynamic Token Pruning for Large Language Models

449

06 Apr 2025

Thanos: A Block-wise Pruning Algorithm for Efficient Large Language Model Compression

Ivan Ilin

Peter Richtárik

163

06 Apr 2025

Compression Laws for Large Language Models

Ayan Sengupta

Siddhant Chaudhary

Tanmoy Chakraborty

205

06 Apr 2025

Towards Understanding and Improving Refusal in Compressed Models via Mechanistic Interpretability

Vishnu Kabir Chhabra

Mohammad Mahdi Khalili

AI4CE

242

05 Apr 2025

Entropy-Based Block Pruning for Efficient Large Language Models

210

04 Apr 2025

When Reasoning Meets Compression: Understanding the Effects of LLMs Compression on Large Reasoning Models

527

02 Apr 2025

SQuat: Subspace-orthogonal KV Cache Quantization

369

31 Mar 2025

Model Hemorrhage and the Robustness Limits of Large Language Models

317

31 Mar 2025

Task-Aware Parameter-Efficient Fine-Tuning of Large Pre-Trained Models at the Edge

259

29 Mar 2025

Breach in the Shield: Unveiling the Vulnerabilities of Large Language Models

289

28 Mar 2025

STADE: Standard Deviation as a Pruning Metric

Diego Coello de Portugal Mecke

Haya Alyoussef

Ilia Koloiarov

Lars Schmidt-Thieme

305

28 Mar 2025

As easy as PIE: understanding when pruning causes language models to disagreeNorth American Chapter of the Association for Computational Linguistics (NAACL), 2025

246

27 Mar 2025

Mobile-VideoGPT: Fast and Accurate Video Understanding Language Model

Abdelrahman M. Shaker

916

27 Mar 2025

Maximum Redundancy Pruning: A Principle-Driven Layerwise Sparsity Allocation for LLMs

271

24 Mar 2025

Energy-Aware LLMs: A step towards sustainable AI for downstream applications

Nguyen Phuc Tran

Brigitte Jaumard

Oscar Delgado

202

22 Mar 2025

Large Language Model Compression via the Nested Activation-Aware Decomposition

236

21 Mar 2025

Accelerating Transformer Inference and Training with 2:4 Activation Sparsity

371

20 Mar 2025

EfficientLLaVA:Generalizable Auto-Pruning for Large Vision-language ModelsComputer Vision and Pattern Recognition (CVPR), 2025

250

19 Mar 2025

Theoretical Foundation of Flow-Based Time Series Generation: Provable Approximation, Generalization, and Efficiency

930

18 Mar 2025

Triad: Empowering LMM-based Anomaly Detection with Vision Expert-guided Visual Tokenizer and Manufacturing Process

298

17 Mar 2025

ML-SpecQD: Multi-Level Speculative Decoding with Quantized Drafts

923

17 Mar 2025

SVD-LLM V2: Optimizing Singular Value Truncation for Large Language Model CompressionNorth American Chapter of the Association for Computational Linguistics (NAACL), 2025

258

16 Mar 2025

PLM: Efficient Peripheral Language Models Hardware-Co-Designed for Ubiquitous Computing

...

950

15 Mar 2025

Changing Base Without Losing Pace: A GPU-Efficient Alternative to MatMul in DNNs

457

15 Mar 2025

Towards Extreme Pruning of LLMs with Plug-and-Play Mixed Sparsity

180

14 Mar 2025

Samoyeds: Accelerating MoE Models with Structured Sparsity Leveraging Sparse Tensor CoresEuropean Conference on Computer Systems (EuroSys), 2025

210

13 Mar 2025

Federated Multimodal Learning with Dual Adapters and Selective Pruning for Communication and Computational EfficiencyIEEE/ACM International Symposium on Cluster, Cloud and Internet Computing (CCGrid), 2025

280

10 Mar 2025

SEAP: Training-free Sparse Expert Activation Pruning Unlock the Brainpower of Large Language Models

250

10 Mar 2025

Can Small Language Models Reliably Resist Jailbreak Attacks? A Comprehensive Evaluation

226

09 Mar 2025

IteRABRe: Iterative Recovery-Aided Block Reduction

Haryo Akbarianto Wibowo

274

08 Mar 2025

Sample-aware Adaptive Structured Pruning for Large Language ModelsAAAI Conference on Artificial Intelligence (AAAI), 2025

215

08 Mar 2025

Keeping Yourself is Important in Downstream Tuning Multimodal Large Language Model

...

OffRL LRM MLLM KELM VLM

350

06 Mar 2025

Wanda++: Pruning Large Language Models via Regional GradientsAnnual Meeting of the Association for Computational Linguistics (ACL), 2025

...

567

06 Mar 2025

How can representation dimension dominate structurally pruned LLMs?

Mingxue Xu

Lisa Alazraki

Danilo Mandic

264

06 Mar 2025

Universality of Layer-Level Entropy-Weighted Quantization Beyond Model Architecture and Size

436

06 Mar 2025

Revisiting Large Language Model Pruning using Neuron Semantic Attribution

177

03 Mar 2025

RSQ: Learning from Important Tokens Leads to Better Quantized LLMs

256

03 Mar 2025

Sparse Brains are Also Adaptive Brains: Cognitive-Load-Aware Dynamic Activation for LLMs

308

26 Feb 2025

CABS: Conflict-Aware and Balanced Sparsification for Enhancing Model Merging

273

26 Feb 2025

Compressing Language Models for Specialized Domains

304

25 Feb 2025