ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2306.11695
  4. Cited By
A Simple and Effective Pruning Approach for Large Language Models

A Simple and Effective Pruning Approach for Large Language Models

20 June 2023
Mingjie Sun
Zhuang Liu
Anna Bair
J. Zico Kolter
ArXivPDFHTML

Papers citing "A Simple and Effective Pruning Approach for Large Language Models"

50 / 271 papers shown
Title
Semantic Retention and Extreme Compression in LLMs: Can We Have Both?
Semantic Retention and Extreme Compression in LLMs: Can We Have Both?
Stanislas Laborde
Martin Cousseau
Antoun Yaacoub
Lionel Prevost
MQ
13
0
0
12 May 2025
Sponge Attacks on Sensing AI: Energy-Latency Vulnerabilities and Defense via Model Pruning
Sponge Attacks on Sensing AI: Energy-Latency Vulnerabilities and Defense via Model Pruning
Syed Mhamudul Hasan
Hussein Zangoti
Iraklis Anagnostopoulos
Abdur R. Shahid
AAML
14
0
0
09 May 2025
FloE: On-the-Fly MoE Inference on Memory-constrained GPU
FloE: On-the-Fly MoE Inference on Memory-constrained GPU
Yuxin Zhou
Zheng Li
J. Zhang
Jue Wang
Y. Wang
Zhongle Xie
Ke Chen
Lidan Shou
MoE
39
0
0
09 May 2025
SPAP: Structured Pruning via Alternating Optimization and Penalty Methods
SPAP: Structured Pruning via Alternating Optimization and Penalty Methods
Hanyu Hu
Xiaoming Yuan
44
0
0
06 May 2025
EntroLLM: Entropy Encoded Weight Compression for Efficient Large Language Model Inference on Edge Devices
EntroLLM: Entropy Encoded Weight Compression for Efficient Large Language Model Inference on Edge Devices
Arnab Sanyal
Prithwish Mukherjee
Gourav Datta
Sandeep P. Chinchali
MQ
32
0
0
05 May 2025
Beyond the model: Key differentiators in large language models and multi-agent services
Beyond the model: Key differentiators in large language models and multi-agent services
Muskaan Goyal
Pranav Bhasin
LLMAG
ELM
39
0
0
05 May 2025
ReplaceMe: Network Simplification via Layer Pruning and Linear Transformations
ReplaceMe: Network Simplification via Layer Pruning and Linear Transformations
Dmitriy Shopkhoev
Ammar Ali
Magauiya Zhussip
Valentin Malykh
Stamatios Lefkimmiatis
N. Komodakis
Sergey Zagoruyko
VLM
34
0
0
05 May 2025
Efficient Shapley Value-based Non-Uniform Pruning of Large Language Models
Efficient Shapley Value-based Non-Uniform Pruning of Large Language Models
Chuan Sun
Han Yu
Lizhen Cui
Xiaoxiao Li
28
0
0
03 May 2025
Position: Enough of Scaling LLMs! Lets Focus on Downscaling
Position: Enough of Scaling LLMs! Lets Focus on Downscaling
Ayan Sengupta
Yash Goel
Tanmoy Chakraborty
34
0
0
02 May 2025
Efficient LLMs with AMP: Attention Heads and MLP Pruning
Efficient LLMs with AMP: Attention Heads and MLP Pruning
Leandro Giusti Mugnaini
Bruno Yamamoto
Lucas Lauton de Alcantara
Victor Zacarias
Edson Bollis
Lucas Pellicer
A. H. R. Costa
Artur Jordao
37
0
0
29 Apr 2025
Towards Faster and More Compact Foundation Models for Molecular Property Prediction
Towards Faster and More Compact Foundation Models for Molecular Property Prediction
Yasir Ghunaim
Andrés Villa
Gergo Ignacz
Gyorgy Szekely
Motasem Alfarra
Bernard Ghanem
AI4CE
84
0
0
28 Apr 2025
R-Sparse: Rank-Aware Activation Sparsity for Efficient LLM Inference
R-Sparse: Rank-Aware Activation Sparsity for Efficient LLM Inference
Zhenyu (Allen) Zhang
Zechun Liu
Yuandong Tian
Harshit Khaitan
Z. Wang
Steven Li
57
0
0
28 Apr 2025
Towards Harnessing the Collaborative Power of Large and Small Models for Domain Tasks
Towards Harnessing the Collaborative Power of Large and Small Models for Domain Tasks
Yang Janet Liu
Bingjie Yan
Tianyuan Zou
Jianqing Zhang
Zixuan Gu
...
J. Li
Xiaozhou Ye
Ye Ouyang
Qiang Yang
Y. Zhang
ALM
56
0
0
24 Apr 2025
NoWag: A Unified Framework for Shape Preserving Compression of Large Language Models
NoWag: A Unified Framework for Shape Preserving Compression of Large Language Models
Lawrence Liu
Inesh Chakrabarti
Yixiao Li
Mengdi Wang
Tuo Zhao
Lin F. Yang
MQ
21
0
0
20 Apr 2025
Accelerating LLM Inference with Flexible N:M Sparsity via A Fully Digital Compute-in-Memory Accelerator
Accelerating LLM Inference with Flexible N:M Sparsity via A Fully Digital Compute-in-Memory Accelerator
Akshat Ramachandran
Souvik Kundu
Arnab Raha
Shamik Kundu
Deepak K. Mathaikutty
Tushar Krishna
24
1
0
19 Apr 2025
Sparsity Outperforms Low-Rank Projections in Few-Shot Adaptation
Sparsity Outperforms Low-Rank Projections in Few-Shot Adaptation
Nairouz Mrabah
Nicolas Richet
Ismail Ben Ayed
Eric Granger
BDL
VLM
47
0
0
16 Apr 2025
Domain-Specific Pruning of Large Mixture-of-Experts Models with Few-shot Demonstrations
Domain-Specific Pruning of Large Mixture-of-Experts Models with Few-shot Demonstrations
Zican Dong
Han Peng
Peiyu Liu
Wayne Xin Zhao
Dong Wu
Feng Xiao
Z. Wang
MoE
22
0
0
09 Apr 2025
Quantization Hurts Reasoning? An Empirical Study on Quantized Reasoning Models
Quantization Hurts Reasoning? An Empirical Study on Quantized Reasoning Models
Ruikang Liu
Yuxuan Sun
Manyi Zhang
Haoli Bai
Xianzhi Yu
Tiezheng Yu
C. Yuan
Lu Hou
MQ
LRM
31
5
0
07 Apr 2025
AccLLM: Accelerating Long-Context LLM Inference Via Algorithm-Hardware Co-Design
AccLLM: Accelerating Long-Context LLM Inference Via Algorithm-Hardware Co-Design
Yanbiao Liang
Huihong Shi
Haikuo Shao
Zhongfeng Wang
15
0
0
07 Apr 2025
Compression Laws for Large Language Models
Compression Laws for Large Language Models
Ayan Sengupta
Siddhant Chaudhary
Tanmoy Chakraborty
26
0
0
06 Apr 2025
Thanos: A Block-wise Pruning Algorithm for Efficient Large Language Model Compression
Thanos: A Block-wise Pruning Algorithm for Efficient Large Language Model Compression
Ivan Ilin
Peter Richtárik
19
0
0
06 Apr 2025
Sensitivity Meets Sparsity: The Impact of Extremely Sparse Parameter Patterns on Theory-of-Mind of Large Language Models
Sensitivity Meets Sparsity: The Impact of Extremely Sparse Parameter Patterns on Theory-of-Mind of Large Language Models
Yuheng Wu
Wentao Guo
Zirui Liu
Heng Ji
Zhaozhuo Xu
Denghui Zhang
33
0
0
05 Apr 2025
When Reasoning Meets Compression: Benchmarking Compressed Large Reasoning Models on Complex Reasoning Tasks
When Reasoning Meets Compression: Benchmarking Compressed Large Reasoning Models on Complex Reasoning Tasks
Nan Zhang
Yusen Zhang
Prasenjit Mitra
Rui Zhang
MQ
LRM
44
2
0
02 Apr 2025
Model Hemorrhage and the Robustness Limits of Large Language Models
Model Hemorrhage and the Robustness Limits of Large Language Models
Ziyang Ma
Z. Li
L. Zhang
Gui-Song Xia
Bo Du
Liangpei Zhang
Dacheng Tao
48
0
0
31 Mar 2025
SQuat: Subspace-orthogonal KV Cache Quantization
SQuat: Subspace-orthogonal KV Cache Quantization
Hao Wang
Ligong Han
Kai Xu
Akash Srivastava
MQ
43
0
0
31 Mar 2025
As easy as PIE: understanding when pruning causes language models to disagree
As easy as PIE: understanding when pruning causes language models to disagree
Pietro Tropeano
Maria Maistro
Tuukka Ruotsalo
Christina Lioma
50
0
0
27 Mar 2025
Mobile-VideoGPT: Fast and Accurate Video Understanding Language Model
Mobile-VideoGPT: Fast and Accurate Video Understanding Language Model
Abdelrahman M. Shaker
Muhammad Maaz
Chenhui Gou
Hamid Rezatofighi
Salman Khan
F. Khan
49
0
0
27 Mar 2025
Unlocking Efficient Long-to-Short LLM Reasoning with Model Merging
Unlocking Efficient Long-to-Short LLM Reasoning with Model Merging
Han Wu
Yuxuan Yao
Shuqi Liu
Zehua Liu
Xiaojin Fu
Xiongwei Han
X. Li
Hui-Ling Zhen
Tao Zhong
Mingxuan Yuan
MoMe
LRM
75
4
0
26 Mar 2025
Maximum Redundancy Pruning: A Principle-Driven Layerwise Sparsity Allocation for LLMs
Maximum Redundancy Pruning: A Principle-Driven Layerwise Sparsity Allocation for LLMs
Chang Gao
Kang Zhao
J. Chen
Liping Jing
37
0
0
24 Mar 2025
ZeroLM: Data-Free Transformer Architecture Search for Language Models
ZeroLM: Data-Free Transformer Architecture Search for Language Models
Zhen-Song Chen
Hong-Wei Ding
Xian-Jia Wang
Witold Pedrycz
48
0
0
24 Mar 2025
Energy-Aware LLMs: A step towards sustainable AI for downstream applications
Energy-Aware LLMs: A step towards sustainable AI for downstream applications
Nguyen Phuc Tran
Brigitte Jaumard
Oscar Delgado
46
0
0
22 Mar 2025
LoRASculpt: Sculpting LoRA for Harmonizing General and Specialized Knowledge in Multimodal Large Language Models
LoRASculpt: Sculpting LoRA for Harmonizing General and Specialized Knowledge in Multimodal Large Language Models
Jian Liang
Wenke Huang
Guancheng Wan
Qu Yang
Mang Ye
MoMe
CLL
AI4CE
60
1
0
21 Mar 2025
Accelerating Transformer Inference and Training with 2:4 Activation Sparsity
Accelerating Transformer Inference and Training with 2:4 Activation Sparsity
Daniel Haziza
Timothy Chou
Dhruv Choudhary
Luca Wehrstedt
Francisco Massa
Jiecao Yu
Geonhwa Jeong
Supriya Rao
Patrick Labatut
Jesse Cai
34
0
0
20 Mar 2025
EfficientLLaVA:Generalizable Auto-Pruning for Large Vision-language Models
EfficientLLaVA:Generalizable Auto-Pruning for Large Vision-language Models
Yinan Liang
Z. Wang
Xiuwei Xu
Jie Zhou
Jiwen Lu
VLM
LRM
46
0
0
19 Mar 2025
Triad: Empowering LMM-based Anomaly Detection with Vision Expert-guided Visual Tokenizer and Manufacturing Process
Triad: Empowering LMM-based Anomaly Detection with Vision Expert-guided Visual Tokenizer and Manufacturing Process
Yuanze Li
Shihao Yuan
Haolin Wang
Qizhang Li
Ming-Yu Liu
Chen Xu
Guangming Shi
Wangmeng Zuo
51
0
0
17 Mar 2025
Changing Base Without Losing Pace: A GPU-Efficient Alternative to MatMul in DNNs
Changing Base Without Losing Pace: A GPU-Efficient Alternative to MatMul in DNNs
Nir Ailon
Akhiad Bercovich
Omri Weinstein
52
0
0
15 Mar 2025
Towards Extreme Pruning of LLMs with Plug-and-Play Mixed Sparsity
Chi Xu
Gefei Zhang
Yantong Zhu
Luca Benini
Guosheng Hu
Yawei Li
Zhihong Zhang
29
0
0
14 Mar 2025
Safe Vision-Language Models via Unsafe Weights Manipulation
Safe Vision-Language Models via Unsafe Weights Manipulation
Moreno DÍncà
E. Peruzzo
Xingqian Xu
Humphrey Shi
N. Sebe
Massimiliano Mancini
MU
55
0
0
14 Mar 2025
Federated Multimodal Learning with Dual Adapters and Selective Pruning for Communication and Computational Efficiency
Duy Phuong Nguyen
J. P. Muñoz
Tanya Roosta
Ali Jannesari
FedML
57
0
0
10 Mar 2025
Wanda++: Pruning Large Language Models via Regional Gradients
Wanda++: Pruning Large Language Models via Regional Gradients
Yifan Yang
Kai Zhen
Bhavana Ganesh
Aram Galstyan
Goeric Huybrechts
...
S. Bodapati
Nathan Susanj
Zheng Zhang
Jack FitzGerald
Abhishek Kumar
47
0
0
06 Mar 2025
Keeping Yourself is Important in Downstream Tuning Multimodal Large Language Model
Wenke Huang
Jian Liang
Xianda Guo
Yiyang Fang
Guancheng Wan
...
Bin Yang
He Li
Jiawei Shao
Mang Ye
Bo Du
OffRL
LRM
MLLM
KELM
VLM
63
1
0
06 Mar 2025
LEWIS (LayEr WIse Sparsity) -- A Training Free Guided Model Merging Approach
Hetarth Chopra
Vidhi Rambhia
Vikram Adve
MoMe
60
0
0
05 Mar 2025
RSQ: Learning from Important Tokens Leads to Better Quantized LLMs
Yi-Lin Sung
Prateek Yadav
Jialu Li
Jaehong Yoon
Mohit Bansal
MQ
52
1
0
03 Mar 2025
Revisiting Large Language Model Pruning using Neuron Semantic Attribution
Yizhuo Ding
Xinwei Sun
Yanwei Fu
Guosheng Hu
56
0
0
03 Mar 2025
CABS: Conflict-Aware and Balanced Sparsification for Enhancing Model Merging
Zongzhen Yang
Binhang Qi
Hailong Sun
Wenrui Long
Ruobing Zhao
Xiang Gao
MoMe
48
0
0
26 Feb 2025
A Sliding Layer Merging Method for Efficient Depth-Wise Pruning in LLMs
A Sliding Layer Merging Method for Efficient Depth-Wise Pruning in LLMs
Xuan Ding
Rui Sun
Yunjian Zhang
Xiu Yan
Yueqi Zhou
Kaihao Huang
Suzhong Fu
Chuanlong Xie
Yao Zhu
49
1
0
26 Feb 2025
The Lottery LLM Hypothesis, Rethinking What Abilities Should LLM Compression Preserve?
The Lottery LLM Hypothesis, Rethinking What Abilities Should LLM Compression Preserve?
Zhenheng Tang
Xiang Liu
Qian Wang
Peijie Dong
Bingsheng He
Xiaowen Chu
Bo Li
LRM
50
1
0
24 Feb 2025
Probe Pruning: Accelerating LLMs through Dynamic Pruning via Model-Probing
Probe Pruning: Accelerating LLMs through Dynamic Pruning via Model-Probing
Qi Le
Enmao Diao
Ziyan Wang
Xinran Wang
Jie Ding
Li Yang
Ali Anwar
67
1
0
24 Feb 2025
Dynamic Low-Rank Sparse Adaptation for Large Language Models
Dynamic Low-Rank Sparse Adaptation for Large Language Models
Weizhong Huang
Yuxin Zhang
Xiawu Zheng
Y. Liu
Jing Lin
Yiwu Yao
Rongrong Ji
85
0
0
21 Feb 2025
The Impact of Inference Acceleration on Bias of LLMs
The Impact of Inference Acceleration on Bias of LLMs
Elisabeth Kirsten
Ivan Habernal
Vedant Nanda
Muhammad Bilal Zafar
33
0
0
20 Feb 2025
123456
Next