v1v2v3 (latest)

SparseGPT: Massive Language Models Can Be Accurately Pruned in One-Shot

International Conference on Machine Learning (ICML), 2023

2 January 2023

Elias Frantar

Dan Alistarh

VLM

ArXiv (abs)PDF HTML HuggingFace (3 upvotes)Github (799★)

Papers citing "SparseGPT: Massive Language Models Can Be Accurately Pruned in One-Shot"

50 / 665 papers shown

Systematic Weight Evaluation for Pruning Large Language Models: Enhancing Performance and Sustainability

Ashhadul Islam

S. Belhaouari

Amine Bermak

232

24 Feb 2025

When Compression Meets Model Compression: Memory-Efficient Double Compression for Large Language ModelsConference on Empirical Methods in Natural Language Processing (EMNLP), 2025

329

24 Feb 2025

The Lottery LLM Hypothesis, Rethinking What Abilities Should LLM Compression Preserve?

298

24 Feb 2025

Probe Pruning: Accelerating LLMs through Dynamic Pruning via Model-ProbingInternational Conference on Learning Representations (ICLR), 2025

334

24 Feb 2025

LED-Merging: Mitigating Safety-Utility Conflicts in Model Merging with Location-Election-DisjointAnnual Meeting of the Association for Computational Linguistics (ACL), 2025

976

24 Feb 2025

Delta Decompression for MoE-based LLMs Compression

345

24 Feb 2025

Automatic Joint Structured Pruning and Quantization for Efficient Neural Network Training and CompressionComputer Vision and Pattern Recognition (CVPR), 2025

317

23 Feb 2025

Dynamic Low-Rank Sparse Adaptation for Large Language ModelsInternational Conference on Learning Representations (ICLR), 2025

450

21 Feb 2025

PPC-GPT: Federated Task-Specific Compression of Large Language Models via Pruning and Chain-of-Thought Distillation

237

21 Feb 2025

EvoP: Robust LLM Inference via Evolutionary Pruning

627

19 Feb 2025

MaskPrune: Mask-based LLM Pruning for Layer-wise Uniform Structures

204

19 Feb 2025

PTQ1.61: Push the Real Limit of Extremely Low-Bit Post-Training Quantization Methods for Large Language ModelsAnnual Meeting of the Association for Computational Linguistics (ACL), 2025

342

18 Feb 2025

DSMoE: Matrix-Partitioned Experts with Dynamic Routing for Computation-Efficient Dense LLMs

...

391

18 Feb 2025

Benchmarking Post-Training Quantization in LLMs: Comprehensive Taxonomy, Unified Evaluation, and Comparative Analysis

601

18 Feb 2025

PASER: Post-Training Data Selection for Efficient Pruned Large Language Model Recovery

395

18 Feb 2025

Signal Collapse in One-Shot Pruning: When Sparse Models Fail to Distinguish Neural Representations

Dhananjay Saikumar

Blesson Varghese

209

18 Feb 2025

An Efficient Sparse Fine-Tuning with Low Quantization Error via Neural Network Pruning

Cen-Jhih Li

Aditya Bhaskara

391

17 Feb 2025

MaZO: Masked Zeroth-Order Optimization for Multi-Task Fine-Tuning of Large Language Models

Athanasios Mouchtaris

Siegfried Kunzmann

Zheng Zhang

370

17 Feb 2025

EfficientLLM: Scalable Pruning-Aware Pretraining for Architecture-Agnostic Edge Language Models

615

10 Feb 2025

Identify Critical KV Cache in LLM Inference from an Output Perturbation Perspective

277

06 Feb 2025

M2R2: Mixture of Multi-Rate Residuals for Efficient Transformer Inference

448

04 Feb 2025

Choose Your Model Size: Any Compression of Large Language Models Without Re-Computation

249

03 Feb 2025

Progressive Binarization with Semi-Structured Pruning for LLMs

561

03 Feb 2025

HASSLE-free: A unified Framework for Sparse plus Low-Rank Matrix Decomposition for LLMs

120

02 Feb 2025

Symmetric Pruning of Large Language Models

Kai Yi

Peter Richtárik

AAML VLM

320

31 Jan 2025

Brain network science modelling of sparse neural networks enables Transformers and LLMs to perform as fully connected

584

31 Jan 2025

Merino: Entropy-driven Design for Generative Language Models on IoT DevicesAAAI Conference on Artificial Intelligence (AAAI), 2024

373

28 Jan 2025

GUIDE: A Global Unified Inference Engine for Deploying Large Language Models in Heterogeneous Environments

Yanyu Chen

Ganhong Huang

276

28 Jan 2025

SLoPe: Double-Pruned Sparse Plus Lazy Low-Rank Adapter Pretraining of LLMsInternational Conference on Learning Representations (ICLR), 2024

379

28 Jan 2025

You Only Prune Once: Designing Calibration-Free Model Compression With Policy LearningInternational Conference on Learning Representations (ICLR), 2025

Ayan Sengupta

Siddhant Chaudhary

Tanmoy Chakraborty

336

25 Jan 2025

Optimization Strategies for Enhancing Resource Efficiency in Transformers & Large Language ModelsInternational Conference on Performance Engineering (ICPE), 2025

Tom Wallace

Naser Ezzati-Jivan

Beatrice Ombuki-Berman

232

16 Jan 2025

DiscQuant: A Quantization Method for Neural Networks Inspired by Discrepancy TheoryAnnual Conference Computational Learning Theory (COLT), 2025

291

11 Jan 2025

Deriving Coding-Specific Sub-Models from LLMs using Resource-Efficient Pruning

209

10 Jan 2025

Tailored-LLaMA: Optimizing Few-Shot Learning in Pruned LLaMA Models with Task-Specific PromptsEuropean Conference on Artificial Intelligence (ECAI), 2024

Danyal Aftab

Steven Davy

ALM

270

10 Jan 2025

iServe: An Intent-based Serving System for LLMs

Dimitrios Liakopoulos

1.0K

08 Jan 2025

The Efficiency vs. Accuracy Trade-off: Optimizing RAG-Enhanced LLM Recommender Systems Using Multi-Head Early ExitAnnual Meeting of the Association for Computational Linguistics (ACL), 2025

...

185

04 Jan 2025

Lillama: Large Language Models Compression via Low-Rank Feature Distillation

Yaya Sy

Christophe Cerisara

Irina Illina

302

31 Dec 2024

MaskGaussian: Adaptive 3D Gaussian Representation from Probabilistic MasksComputer Vision and Pattern Recognition (CVPR), 2024

482

29 Dec 2024

DecDEC: A Systems Approach to Advancing Low-Bit LLM QuantizationUSENIX Symposium on Operating Systems Design and Implementation (OSDI), 2024

445

28 Dec 2024

SlimGPT: Layer-wise Structured Pruning for Large Language ModelsNeural Information Processing Systems (NeurIPS), 2024

207

24 Dec 2024

LSAQ: Layer-Specific Adaptive Quantization for Large Language Model Deployment

304

24 Dec 2024

GQSA: Group Quantization and Sparsity for Accelerating Large Language Model Inference

423

23 Dec 2024

HyperCLIP: Adapting Vision-Language models with Hypernetworks

Victor Akinwande

Mohammad Sadegh Norouzzadeh

317

21 Dec 2024

Extracting Interpretable Task-Specific Circuits from Large Language Models for Faster InferenceAAAI Conference on Artificial Intelligence (AAAI), 2024

Jorge García-Carrasco

A. Maté

Juan Trujillo

277

20 Dec 2024

FineGates: LLMs Finetuning with Compression using Stochastic Gates

Jonathan Svirsky

Yehonathan Refael

Ofir Lindenbaum

282

17 Dec 2024

C3oT: Generating Shorter Chain-of-Thought without Compromising EffectivenessAAAI Conference on Artificial Intelligence (AAAI), 2024

460

112

16 Dec 2024

QPruner: Probabilistic Decision Quantization for Structured Pruning in Large Language ModelsNorth American Chapter of the Association for Computational Linguistics (NAACL), 2024

212

16 Dec 2024

TrimLLM: Progressive Layer Dropping for Domain-Specific LLMs

Lanxiang Hu

Tajana Rosing

Hao Zhang

244

15 Dec 2024

DiffKV: Differentiated Memory Management for Large Language Models with Parallel KV CompactionSymposium on Operating Systems Principles (SOSP), 2024

724

04 Dec 2024

CPTQuant -- A Novel Mixed Precision Post-Training Quantization Techniques for Large Language Models

Amitash Nanda

Sree Bhargavi Balija

D. Sahoo

264

03 Dec 2024