v1v2v3 (latest)

SparseGPT: Massive Language Models Can Be Accurately Pruned in One-Shot

International Conference on Machine Learning (ICML), 2023

2 January 2023

Elias Frantar

Dan Alistarh

VLM

ArXiv (abs)PDF HTML HuggingFace (3 upvotes)Github (799★)

Papers citing "SparseGPT: Massive Language Models Can Be Accurately Pruned in One-Shot"

50 / 665 papers shown

PATCH: Learnable Tile-level Hybrid Sparsity for LLMs

Younes Hourri

Mohammad Mozaffari

M. Dehnavi

216

24 Dec 2025

Mitigating Catastrophic Forgetting in Target Language Adaptation of LLMs via Source-Shielded Updates

220

04 Dec 2025

Think Before You Prune: Self-Reflective Structured Pruning for Reasoning Language Models

128

01 Dec 2025

Harmonious Parameter Adaptation in Continual Visual Instruction Tuning for Safety-Aligned MLLMs

201

25 Nov 2025

Mosaic Pruning: A Hierarchical Framework for Generalizable Pruning of Mixture-of-Experts Models

132

25 Nov 2025

EfficientXpert: Efficient Domain Adaptation for Large Language Models via Propagation-Aware Pruning

Songlin Zhao

Michael Pitts

Zhuwei Qin

25 Nov 2025

INTERLACE: Interleaved Layer Pruning and Efficient Adaptation in Large Vision-Language Models

183

24 Nov 2025

FastForward Pruning: Efficient LLM Pruning via Single-Step Reinforcement Learning

377

24 Nov 2025

ModHiFi: Identifying High Fidelity predictive components for Model Modification

Chiranjib Bhattacharyya

138

24 Nov 2025

Towards Efficient VLMs: Information-Theoretic Driven Compression via Adaptive Structural Pruning

24 Nov 2025

Think Before You Prune: Selective Self-Generated Calibration for Pruning Large Reasoning Models

108

24 Nov 2025

Exploiting the Experts: Unauthorized Compression in MoE-LLMs

Pinaki Prasad Guha Neogi

Ahmad Mohammadshirazi

Dheeraj Kulshrestha

R. Ramnath

MoE

147

22 Nov 2025

^3

-Pruner: Towards Efficient, Economical, and Effective Layer Pruning for Large Language Models

233

21 Nov 2025

Pluggable Pruning with Contiguous Layer Distillation for Diffusion Transformers

134

20 Nov 2025

PocketLLM: Ultimate Compression of Large Language Models via Meta Networks

124

19 Nov 2025

Breaking Expert Knowledge Limits: Self-Pruning for Large Language Models

222

19 Nov 2025

OmniZip: Audio-Guided Dynamic Token Compression for Fast Omnimodal Large Language Models

256

18 Nov 2025

MACKO: Sparse Matrix-Vector Multiplication for Low Sparsity

Vladimír Macko

Vladimír Boža

136

17 Nov 2025

Weight-sparse transformers have interpretable circuits

232

17 Nov 2025

TZ-LLM: Protecting On-Device Large Language Models with Arm TrustZone

17 Nov 2025

Efficient Mathematical Reasoning Models via Dynamic Pruning and Knowledge Distillation

148

15 Nov 2025

A^3

: Attention-Aware Accurate KV Cache Fusion for Fast Large Language Model Serving

13 Nov 2025

EcoSpa: Efficient Transformer Training with Coupled Sparsity

...

09 Nov 2025

Ghost in the Transformer: Detecting Model Reuse with Invariant Spectral Signatures

160

09 Nov 2025

APP: Accelerated Path Patching with Task-Specific Pruning

07 Nov 2025

TwIST: Rigging the Lottery in Transformers with Independent Subnetwork Training

Anastasios Kyrillidis

172

06 Nov 2025

IG-Pruning: Input-Guided Block Pruning for Large Language Models

229

04 Nov 2025

Optimal Singular Damage: Efficient LLM Inference in Low Storage Regimes

Mohammadsajad Alipour

Mohammad Mohammadi Amiri

100

04 Nov 2025

Continual Learning, Not Training: Online Adaptation For Agents

Aman Jaglan

Jarrod Barnes

CLL

193

02 Nov 2025

AI Progress Should Be Measured by Capability-Per-Resource, Not Scale Alone: A Framework for Gradient-Guided Resource Allocation in LLMs

David McCoy

Yulun Wu

Zachary Butzin-Dozier

123

02 Nov 2025

1+1>2: A Synergistic Sparse and Low-Rank Compression Method for Large Language Models

146

30 Oct 2025

NeuronMM: High-Performance Matrix Multiplication for LLM Inference on AWS Trainium

167

29 Oct 2025

FlowMM: Cross-Modal Information Flow Guided KV Cache Merging for Efficient Multimodal Context Inference

201

29 Oct 2025

PRO: Enabling Precise and Robust Text Watermark for Open-Source LLMs

401

27 Oct 2025

PAHQ: Accelerating Automated Circuit Discovery through Mixed-Precision Inference Optimization

194

27 Oct 2025

Frustratingly Easy Task-aware Pruning for Large Language Models

142

26 Oct 2025

FastVLM: Self-Speculative Decoding for Fast Vision-Language Model Inference

Divya J. Bajpai

M. Hanawal

MLLM VLM

211

26 Oct 2025

TELL-TALE: Task Efficient LLMs with Task Aware Layer Elimination

Omar Naim

Krish Sharma

Nicholas M. Asher

Nicholas Asher

26 Oct 2025

Scaling Up Efficient Small Language Models Serving and Deployment for Semantic Job Search

...

139

25 Oct 2025

The Structural Scalpel: Automated Contiguous Layer Pruning for Large Language Models

25 Oct 2025

Beyond Uniform SVD:Dual-Level Optimization across Columns and Modules for LLM Compression

22 Oct 2025

ARA: Adaptive Rank Allocation for Efficient Large Language Model SVD Compression

124

22 Oct 2025

Restoring Pruned Large Language Models via Lost Component Compensation

141

22 Oct 2025

Elastic ViTs from Pretrained Models without Retraining

148

20 Oct 2025

The Graphon Limit Hypothesis: Understanding Neural Network Pruning via Infinite Width Analysis

142

20 Oct 2025

From Local to Global: Revisiting Structured Pruning Paradigms for Large Language Models

139

20 Oct 2025

Mixed-Precision Quantization for Language Models: Techniques and Prospects

238

19 Oct 2025

Synera: Synergistic LLM Serving across Device and Cloud at Scale

113

17 Oct 2025

A Free Lunch in LLM Compression: Revisiting Retraining after Pruning

16 Oct 2025

Mirror Speculative Decoding: Breaking the Serial Barrier in LLM Inference

332

15 Oct 2025