v1v2 (latest)

ZipLM: Inference-Aware Structured Pruning of Language Models

Neural Information Processing Systems (NeurIPS), 2023

7 February 2023

Eldar Kurtic

Elias Frantar

Dan Alistarh

ArXiv (abs)PDF HTML HuggingFace (1 upvotes)Github (2★)

Papers citing "ZipLM: Inference-Aware Structured Pruning of Language Models"

27 / 27 papers shown

C-SWAP: Explainability-Aware Structured Pruning for Efficient Neural Networks Compression

Baptiste Bauvin

Loïc Baret

Ola Ahmad

170

21 Oct 2025

Ensembling Pruned Attention Heads For Uncertainty-Aware Efficient Transformers

208

21 Oct 2025

MosaicDiff: Training-free Structural Pruning for Diffusion Model Acceleration Reflecting Pretraining Dynamics

172

13 Oct 2025

PrunedLoRA: Robust Gradient-Based structured pruning for Low-rank Adaptation in Fine-tuning

309

30 Sep 2025

Motivating Next-Gen Accelerators with Flexible (N:M) Activation Sparsity via Benchmarking Lightweight Post-Training Sparsification Approaches

193

26 Sep 2025

ResSVD: Residual Compensated SVD for Large Language Model Compression

410

26 May 2025

SPAP: Structured Pruning via Alternating Optimization and Penalty Methods

Hanyu Hu

Xiaoming Yuan

309

06 May 2025

TeleSparse: Practical Privacy-Preserving Verification of Deep Neural NetworksProceedings on Privacy Enhancing Technologies (PoPETs), 2025

Mohammad Maheri

Hamed Haddadi

Alex Davidson

415

27 Apr 2025

SQuat: Subspace-orthogonal KV Cache Quantization

433

31 Mar 2025

Triad: Empowering LMM-based Anomaly Detection with Vision Expert-guided Visual Tokenizer and Manufacturing Process

380

17 Mar 2025

Sliding-Window Merging for Compacting Patch-Redundant Layers in LLMs

Angelica I Aviles-Rivero

Chuanlong Xie

Yao Zhu

654

26 Feb 2025

SlimGPT: Layer-wise Structured Pruning for Large Language ModelsNeural Information Processing Systems (NeurIPS), 2024

295

24 Dec 2024

Deploying Foundation Model Powered Agent Services: A Survey

...

561

18 Dec 2024

Zeroth-Order Adaptive Neuron Alignment Based Pruning without Re-Training

Elia Cunegatti

Leonardo Lucio Custode

Giovanni Iacca

703

11 Nov 2024

EvoPress: Accurate Dynamic Model Compression via Evolutionary Search

555

18 Oct 2024

Self-Data Distillation for Recovering Quality in Pruned Large Language Models

550

13 Oct 2024

A Convex-optimization-based Layer-wise Post-training Pruner for Large Language Models

243

07 Aug 2024

Greedy Output Approximation: Towards Efficient Structured Pruning for LLMs Without Retraining

Jianwei Li

Yijun Dong

Qi Lei

405

26 Jul 2024

MINI-LLM: Memory-Efficient Structured Pruning for Large Language Models

Hongrong Cheng

Miao Zhang

J. Q. Shi

319

16 Jul 2024

Inference Optimization of Foundation Models on AI Accelerators

Matthäus Kleindessner

358

12 Jul 2024

Achieving Sparse Activation in Small Language Models

268

03 Jun 2024

A Survey on Efficient Inference for Large Language Models

...

Shengen Yan

483

205

22 Apr 2024

OSSCAR: One-Shot Structured Pruning in Vision and Language Models with Combinatorial Optimization

436

02 Mar 2024

Shortened LLaMA: Depth Pruning for Large Language Models with Comparison of Retraining Methods

Tae-Ho Kim

370

05 Feb 2024

Sparse Fine-tuning for Inference Acceleration of Large Language Models

Eldar Kurtic

Denis Kuznedelev

Elias Frantar

Michael Goin

Dan Alistarh

226

10 Oct 2023

FrugalGPT: How to Use Large Language Models While Reducing Cost and Improving Performance

452

522

09 May 2023

Latency Adjustable Transformer Encoder for Language UnderstandingIEEE Transactions on Neural Networks and Learning Systems (TNNLS), 2022

Sajjad Kachuee

M. Sharifkhani

684

10 Jan 2022