ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2302.04089
  4. Cited By
ZipLM: Inference-Aware Structured Pruning of Language Models
v1v2 (latest)

ZipLM: Inference-Aware Structured Pruning of Language Models

Neural Information Processing Systems (NeurIPS), 2023
7 February 2023
Eldar Kurtic
Elias Frantar
Dan Alistarh
    MQ
ArXiv (abs)PDFHTMLHuggingFace (1 upvotes)Github (2★)

Papers citing "ZipLM: Inference-Aware Structured Pruning of Language Models"

27 / 27 papers shown
C-SWAP: Explainability-Aware Structured Pruning for Efficient Neural Networks Compression
C-SWAP: Explainability-Aware Structured Pruning for Efficient Neural Networks Compression
Baptiste Bauvin
Loïc Baret
Ola Ahmad
170
0
0
21 Oct 2025
Ensembling Pruned Attention Heads For Uncertainty-Aware Efficient Transformers
Ensembling Pruned Attention Heads For Uncertainty-Aware Efficient Transformers
Firas Gabetni
Giuseppe Curci
Andrea Pilzer
Subhankar Roy
Elisa Ricci
Gianni Franchi
208
3
0
21 Oct 2025
MosaicDiff: Training-free Structural Pruning for Diffusion Model Acceleration Reflecting Pretraining Dynamics
MosaicDiff: Training-free Structural Pruning for Diffusion Model Acceleration Reflecting Pretraining Dynamics
Bowei Guo
Shengkun Tang
Cong Zeng
Zhiqiang Shen
172
5
0
13 Oct 2025
PrunedLoRA: Robust Gradient-Based structured pruning for Low-rank Adaptation in Fine-tuning
PrunedLoRA: Robust Gradient-Based structured pruning for Low-rank Adaptation in Fine-tuning
Xin Yu
Cong Xie
Ziyu Zhao
Tiantian Fan
Lingzhou Xue
Zhi-Li Zhang
309
0
0
30 Sep 2025
Motivating Next-Gen Accelerators with Flexible (N:M) Activation Sparsity via Benchmarking Lightweight Post-Training Sparsification Approaches
Motivating Next-Gen Accelerators with Flexible (N:M) Activation Sparsity via Benchmarking Lightweight Post-Training Sparsification Approaches
Shirin Alanova
Kristina Kazistova
Ekaterina Galaeva
Alina Kostromina
Vladimir Smirnov
Redko Dmitry
Alexey Dontsov
Maxim Zhelnin
Evgeny Burnaev
Egor Shvetsov
193
0
0
26 Sep 2025
ResSVD: Residual Compensated SVD for Large Language Model Compression
ResSVD: Residual Compensated SVD for Large Language Model Compression
Haolei Bai
Siyong Jian
Tuo Liang
Yu Yin
Huan Wang
410
5
0
26 May 2025
SPAP: Structured Pruning via Alternating Optimization and Penalty Methods
SPAP: Structured Pruning via Alternating Optimization and Penalty Methods
Hanyu Hu
Xiaoming Yuan
309
1
0
06 May 2025
TeleSparse: Practical Privacy-Preserving Verification of Deep Neural Networks
TeleSparse: Practical Privacy-Preserving Verification of Deep Neural NetworksProceedings on Privacy Enhancing Technologies (PoPETs), 2025
Mohammad Maheri
Hamed Haddadi
Alex Davidson
415
7
0
27 Apr 2025
SQuat: Subspace-orthogonal KV Cache Quantization
SQuat: Subspace-orthogonal KV Cache Quantization
Hao Wang
Ligong Han
Kai Xu
Akash Srivastava
MQ
433
3
0
31 Mar 2025
Triad: Empowering LMM-based Anomaly Detection with Vision Expert-guided Visual Tokenizer and Manufacturing Process
Triad: Empowering LMM-based Anomaly Detection with Vision Expert-guided Visual Tokenizer and Manufacturing Process
Yuanze Li
Shihao Yuan
Haolin Wang
Qizhang Li
Ming-Yu Liu
Chen Xu
Guangming Shi
Wangmeng Zuo
380
4
0
17 Mar 2025
Sliding-Window Merging for Compacting Patch-Redundant Layers in LLMs
Sliding-Window Merging for Compacting Patch-Redundant Layers in LLMs
Xuan Ding
Rui Sun
Yunjian Zhang
Xiu Yan
Yueqi Zhou
Kaihao Huang
Suzhong Fu
Angelica I Aviles-Rivero
Chuanlong Xie
Yao Zhu
654
4
0
26 Feb 2025
SlimGPT: Layer-wise Structured Pruning for Large Language Models
SlimGPT: Layer-wise Structured Pruning for Large Language ModelsNeural Information Processing Systems (NeurIPS), 2024
Gui Ling
Ziyang Wang
Yuliang Yan
Qingwen Liu
295
38
0
24 Dec 2024
Deploying Foundation Model Powered Agent Services: A Survey
Deploying Foundation Model Powered Agent Services: A Survey
Wenchao Xu
Jinyu Chen
Peirong Zheng
Xiaoquan Yi
Tianyi Tian
...
Quan Wan
Yining Qi
Yunfeng Fan
Qinliang Su
Xuemin Shen
AI4CE
561
7
0
18 Dec 2024
Zeroth-Order Adaptive Neuron Alignment Based Pruning without Re-Training
Zeroth-Order Adaptive Neuron Alignment Based Pruning without Re-Training
Elia Cunegatti
Leonardo Lucio Custode
Giovanni Iacca
703
3
0
11 Nov 2024
EvoPress: Accurate Dynamic Model Compression via Evolutionary Search
EvoPress: Accurate Dynamic Model Compression via Evolutionary Search
Oliver Sieberling
Denis Kuznedelev
Eldar Kurtic
Dan Alistarh
MQ
555
5
0
18 Oct 2024
Self-Data Distillation for Recovering Quality in Pruned Large Language Models
Self-Data Distillation for Recovering Quality in Pruned Large Language Models
Vithursan Thangarasa
Ganesh Venkatesh
Mike Lasby
Nish Sinnadurai
Sean Lie
SyDa
550
5
0
13 Oct 2024
A Convex-optimization-based Layer-wise Post-training Pruner for Large
  Language Models
A Convex-optimization-based Layer-wise Post-training Pruner for Large Language Models
Pengxiang Zhao
Hanyu Hu
Ping Li
Yi Zheng
Zhefeng Wang
Xiaoming Yuan
243
2
0
07 Aug 2024
Greedy Output Approximation: Towards Efficient Structured Pruning for
  LLMs Without Retraining
Greedy Output Approximation: Towards Efficient Structured Pruning for LLMs Without Retraining
Jianwei Li
Yijun Dong
Qi Lei
405
10
0
26 Jul 2024
MINI-LLM: Memory-Efficient Structured Pruning for Large Language Models
MINI-LLM: Memory-Efficient Structured Pruning for Large Language Models
Hongrong Cheng
Miao Zhang
J. Q. Shi
319
8
0
16 Jul 2024
Inference Optimization of Foundation Models on AI Accelerators
Inference Optimization of Foundation Models on AI Accelerators
Youngsuk Park
Kailash Budhathoki
Liangfu Chen
Jonas M. Kübler
Jiaji Huang
Matthäus Kleindessner
Jun Huan
Volkan Cevher
Yida Wang
George Karypis
358
16
0
12 Jul 2024
Achieving Sparse Activation in Small Language Models
Achieving Sparse Activation in Small Language Models
Jifeng Song
Kai Huang
Xiangyu Yin
Boyuan Yang
Wei Gao
268
5
0
03 Jun 2024
A Survey on Efficient Inference for Large Language Models
A Survey on Efficient Inference for Large Language Models
Zixuan Zhou
Xuefei Ning
Ke Hong
Tianyu Fu
Jiaming Xu
...
Shengen Yan
Guohao Dai
Xiao-Ping Zhang
Yuhan Dong
Yu Wang
483
205
0
22 Apr 2024
OSSCAR: One-Shot Structured Pruning in Vision and Language Models with
  Combinatorial Optimization
OSSCAR: One-Shot Structured Pruning in Vision and Language Models with Combinatorial Optimization
Xiang Meng
Shibal Ibrahim
Kayhan Behdin
Hussein Hazimeh
Natalia Ponomareva
Rahul Mazumder
VLM
436
15
0
02 Mar 2024
Shortened LLaMA: Depth Pruning for Large Language Models with Comparison
  of Retraining Methods
Shortened LLaMA: Depth Pruning for Large Language Models with Comparison of Retraining Methods
Bo-Kyeong Kim
Geonmin Kim
Tae-Ho Kim
Thibault Castells
Shinkook Choi
Junho Shin
Hyoung-Kyu Song
370
73
0
05 Feb 2024
Sparse Fine-tuning for Inference Acceleration of Large Language Models
Sparse Fine-tuning for Inference Acceleration of Large Language Models
Eldar Kurtic
Denis Kuznedelev
Elias Frantar
Michael Goin
Dan Alistarh
226
17
0
10 Oct 2023
FrugalGPT: How to Use Large Language Models While Reducing Cost and
  Improving Performance
FrugalGPT: How to Use Large Language Models While Reducing Cost and Improving Performance
Lingjiao Chen
Matei A. Zaharia
James Zou
LLMAG
452
522
0
09 May 2023
Latency Adjustable Transformer Encoder for Language Understanding
Latency Adjustable Transformer Encoder for Language UnderstandingIEEE Transactions on Neural Networks and Learning Systems (TNNLS), 2022
Sajjad Kachuee
M. Sharifkhani
684
2
0
10 Jan 2022
1
Page 1 of 1