ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2306.11695
  4. Cited By
A Simple and Effective Pruning Approach for Large Language Models

A Simple and Effective Pruning Approach for Large Language Models

20 June 2023
Mingjie Sun
Zhuang Liu
Anna Bair
J. Zico Kolter
ArXivPDFHTML

Papers citing "A Simple and Effective Pruning Approach for Large Language Models"

50 / 271 papers shown
Title
RankAdaptor: Hierarchical Dynamic Low-Rank Adaptation for Structural
  Pruned LLMs
RankAdaptor: Hierarchical Dynamic Low-Rank Adaptation for Structural Pruned LLMs
Changhai Zhou
Shijie Han
Shiyang Zhang
Shichao Weng
Zekai Liu
Cheng Jin
21
1
0
22 Jun 2024
SDQ: Sparse Decomposed Quantization for LLM Inference
SDQ: Sparse Decomposed Quantization for LLM Inference
Geonhwa Jeong
Po-An Tsai
S. Keckler
Tushar Krishna
MQ
30
3
0
19 Jun 2024
Slice-Level Scheduling for High Throughput and Load Balanced LLM Serving
Slice-Level Scheduling for High Throughput and Load Balanced LLM Serving
Ke Cheng
Wen Hu
Zhi Wang
Hongen Peng
Jianguo Li
Sheng Zhang
43
7
0
19 Jun 2024
Endor: Hardware-Friendly Sparse Format for Offloaded LLM Inference
Endor: Hardware-Friendly Sparse Format for Offloaded LLM Inference
Donghyeon Joo
Ramyad Hadidi
S. Feizi
Bahar Asgari
MQ
16
0
0
17 Jun 2024
RoseLoRA: Row and Column-wise Sparse Low-rank Adaptation of Pre-trained
  Language Model for Knowledge Editing and Fine-tuning
RoseLoRA: Row and Column-wise Sparse Low-rank Adaptation of Pre-trained Language Model for Knowledge Editing and Fine-tuning
Haoyu Wang
Tianci Liu
Ruirui Li
Monica Cheng
Tuo Zhao
Jing Gao
29
7
0
16 Jun 2024
ME-Switch: A Memory-Efficient Expert Switching Framework for Large
  Language Models
ME-Switch: A Memory-Efficient Expert Switching Framework for Large Language Models
Jing Liu
Ruihao Gong
Mingyang Zhang
Yefei He
Jianfei Cai
Bohan Zhuang
MoE
37
0
0
13 Jun 2024
AdaPTwin: Low-Cost Adaptive Compression of Product Twins in Transformers
AdaPTwin: Low-Cost Adaptive Compression of Product Twins in Transformers
Emil Biju
Anirudh Sriram
Mert Pilanci
32
0
0
13 Jun 2024
ALPS: Improved Optimization for Highly Sparse One-Shot Pruning for Large
  Language Models
ALPS: Improved Optimization for Highly Sparse One-Shot Pruning for Large Language Models
Xiang Meng
Kayhan Behdin
Haoyue Wang
Rahul Mazumder
29
3
0
12 Jun 2024
MoreauPruner: Robust Pruning of Large Language Models against Weight
  Perturbations
MoreauPruner: Robust Pruning of Large Language Models against Weight Perturbations
Zixiao Wang
Jingwei Zhang
Wenqian Zhao
Farzan Farnia
Bei Yu
AAML
30
3
0
11 Jun 2024
ShiftAddLLM: Accelerating Pretrained LLMs via Post-Training
  Multiplication-Less Reparameterization
ShiftAddLLM: Accelerating Pretrained LLMs via Post-Training Multiplication-Less Reparameterization
Haoran You
Yipin Guo
Yichao Fu
Wei Zhou
Huihong Shi
Xiaofan Zhang
Souvik Kundu
Amir Yazdanbakhsh
Y. Lin
KELM
44
7
0
10 Jun 2024
Evaluating Zero-Shot Long-Context LLM Compression
Evaluating Zero-Shot Long-Context LLM Compression
Chenyu Wang
Yihan Wang
Kai Li
49
0
0
10 Jun 2024
VTrans: Accelerating Transformer Compression with Variational
  Information Bottleneck based Pruning
VTrans: Accelerating Transformer Compression with Variational Information Bottleneck based Pruning
Oshin Dutta
Ritvik Gupta
Sumeet Agarwal
36
1
0
07 Jun 2024
Enabling Efficient Batch Serving for LMaaS via Generation Length
  Prediction
Enabling Efficient Batch Serving for LMaaS via Generation Length Prediction
Ke Cheng
Wen Hu
Zhi Wang
Peng Du
Jianguo Li
Sheng Zhang
34
10
0
07 Jun 2024
Empirical Guidelines for Deploying LLMs onto Resource-constrained Edge
  Devices
Empirical Guidelines for Deploying LLMs onto Resource-constrained Edge Devices
Ruiyang Qin
Dancheng Liu
Zheyu Yan
Zhaoxuan Tan
Zixuan Pan
Zhenge Jia
Meng-Long Jiang
Ahmed Abbasi
Jinjun Xiong
Yiyu Shi
51
10
0
06 Jun 2024
A Survey of Language-Based Communication in Robotics
A Survey of Language-Based Communication in Robotics
William Hunt
Sarvapali D. Ramchurn
Mohammad D. Soorati
LM&Ro
47
12
0
06 Jun 2024
Pruner-Zero: Evolving Symbolic Pruning Metric from scratch for Large
  Language Models
Pruner-Zero: Evolving Symbolic Pruning Metric from scratch for Large Language Models
Peijie Dong
Lujun Li
Zhenheng Tang
Xiang Liu
Xinglin Pan
Qiang-qiang Wang
Xiaowen Chu
48
22
0
05 Jun 2024
SUBLLM: A Novel Efficient Architecture with Token Sequence Subsampling
  for LLM
SUBLLM: A Novel Efficient Architecture with Token Sequence Subsampling for LLM
Quandong Wang
Yuxuan Yuan
Xiaoyu Yang
Ruike Zhang
Kang Zhao
Wei Liu
Jian Luan
Daniel Povey
Bin Wang
41
0
0
03 Jun 2024
Memorized Images in Diffusion Models share a Subspace that can be
  Located and Deleted
Memorized Images in Diffusion Models share a Subspace that can be Located and Deleted
Ruchika Chavhan
Ondrej Bohdal
Yongshuo Zong
Da Li
Timothy M. Hospedales
29
4
0
01 Jun 2024
Spectrum-Aware Parameter Efficient Fine-Tuning for Diffusion Models
Spectrum-Aware Parameter Efficient Fine-Tuning for Diffusion Models
Xinxi Zhang
Song Wen
Ligong Han
Felix Juefei Xu
Akash Srivastava
Junzhou Huang
Hao Wang
Molei Tao
Dimitris N. Metaxas
DiffM
23
5
0
31 May 2024
Effective Interplay between Sparsity and Quantization: From Theory to Practice
Effective Interplay between Sparsity and Quantization: From Theory to Practice
Simla Burcu Harma
Ayan Chakraborty
Elizaveta Kostenok
Danila Mishin
Dongho Ha
...
Martin Jaggi
Ming Liu
Yunho Oh
Suvinay Subramanian
Amir Yazdanbakhsh
MQ
29
4
0
31 May 2024
Occam Gradient Descent
Occam Gradient Descent
B. N. Kausik
ODL
VLM
32
0
0
30 May 2024
STAT: Shrinking Transformers After Training
STAT: Shrinking Transformers After Training
Megan Flynn
Alexander Wang
Dean Edward Alvarez
Christopher De Sa
Anil Damle
31
1
0
29 May 2024
ConceptPrune: Concept Editing in Diffusion Models via Skilled Neuron
  Pruning
ConceptPrune: Concept Editing in Diffusion Models via Skilled Neuron Pruning
Ruchika Chavhan
Da Li
Timothy M. Hospedales
31
15
0
29 May 2024
Superposed Decoding: Multiple Generations from a Single Autoregressive
  Inference Pass
Superposed Decoding: Multiple Generations from a Single Autoregressive Inference Pass
Ethan Shen
Alan Fan
Sarah M Pratt
Jae Sung Park
Matthew Wallingford
Sham Kakade
Ari Holtzman
Ranjay Krishna
Ali Farhadi
Aditya Kusupati
33
2
0
28 May 2024
OwLore: Outlier-weighed Layerwise Sampled Low-Rank Projection for
  Memory-Efficient LLM Fine-tuning
OwLore: Outlier-weighed Layerwise Sampled Low-Rank Projection for Memory-Efficient LLM Fine-tuning
Pengxiang Li
Lu Yin
Xiaowei Gao
Shiwei Liu
21
7
0
28 May 2024
FinerCut: Finer-grained Interpretable Layer Pruning for Large Language
  Models
FinerCut: Finer-grained Interpretable Layer Pruning for Large Language Models
Yang Zhang
Yawei Li
Xinpeng Wang
Qianli Shen
Barbara Plank
Bernd Bischl
Mina Rezaei
Kenji Kawaguchi
47
7
0
28 May 2024
Exploring Activation Patterns of Parameters in Language Models
Exploring Activation Patterns of Parameters in Language Models
Yudong Wang
Damai Dai
Zhifang Sui
24
1
0
28 May 2024
NV-Embed: Improved Techniques for Training LLMs as Generalist Embedding Models
NV-Embed: Improved Techniques for Training LLMs as Generalist Embedding Models
Chankyu Lee
Rajarshi Roy
Mengyao Xu
Jonathan Raiman
M. Shoeybi
Bryan Catanzaro
Wei Ping
RALM
40
137
0
27 May 2024
Implicit Multimodal Alignment: On the Generalization of Frozen LLMs to
  Multimodal Inputs
Implicit Multimodal Alignment: On the Generalization of Frozen LLMs to Multimodal Inputs
Mustafa Shukor
Matthieu Cord
61
5
0
26 May 2024
SPP: Sparsity-Preserved Parameter-Efficient Fine-Tuning for Large
  Language Models
SPP: Sparsity-Preserved Parameter-Efficient Fine-Tuning for Large Language Models
Xudong Lu
Aojun Zhou
Yuhui Xu
Renrui Zhang
Peng Gao
Hongsheng Li
19
7
0
25 May 2024
PV-Tuning: Beyond Straight-Through Estimation for Extreme LLM
  Compression
PV-Tuning: Beyond Straight-Through Estimation for Extreme LLM Compression
Vladimir Malinovskii
Denis Mazur
Ivan Ilin
Denis Kuznedelev
Konstantin Burlachenko
Kai Yi
Dan Alistarh
Peter Richtárik
MQ
29
18
0
23 May 2024
SliM-LLM: Salience-Driven Mixed-Precision Quantization for Large
  Language Models
SliM-LLM: Salience-Driven Mixed-Precision Quantization for Large Language Models
Wei Huang
Haotong Qin
Yangdong Liu
Yawei Li
Xianglong Liu
Luca Benini
Michele Magno
Xiaojuan Qi
MQ
57
15
0
23 May 2024
Dynamic Activation Pitfalls in LLaMA Models: An Empirical Study
Dynamic Activation Pitfalls in LLaMA Models: An Empirical Study
Chi Ma
Mincong Huang
Chao Wang
Yujie Wang
Lei Yu
11
2
0
15 May 2024
OpenBA-V2: Reaching 77.3% High Compression Ratio with Fast Multi-Stage
  Pruning
OpenBA-V2: Reaching 77.3% High Compression Ratio with Fast Multi-Stage Pruning
Dan Qiao
Yi Su
Pinzheng Wang
Jing Ye
Wen Xie
...
Wenliang Chen
Guohong Fu
Guodong Zhou
Qiaoming Zhu
Min Zhang
MQ
32
0
0
09 May 2024
Dependency-Aware Semi-Structured Sparsity: Declining Roles of Outliers
  in Pruning GLU-based LLMs
Dependency-Aware Semi-Structured Sparsity: Declining Roles of Outliers in Pruning GLU-based LLMs
Zhiyu Guo
Hidetaka Kamigaito
Taro Wanatnabe
19
0
0
03 May 2024
COPAL: Continual Pruning in Large Language Generative Models
COPAL: Continual Pruning in Large Language Generative Models
Srikanth Malla
Joon Hee Choi
Chiho Choi
VLM
CLL
19
1
0
02 May 2024
Beyond the Speculative Game: A Survey of Speculative Execution in Large
  Language Models
Beyond the Speculative Game: A Survey of Speculative Execution in Large Language Models
Chen Zhang
Zhuorui Liu
Dawei Song
LRM
22
3
0
23 Apr 2024
A Survey on Efficient Inference for Large Language Models
A Survey on Efficient Inference for Large Language Models
Zixuan Zhou
Xuefei Ning
Ke Hong
Tianyu Fu
Jiaming Xu
...
Shengen Yan
Guohao Dai
Xiao-Ping Zhang
Yuhan Dong
Yu-Xiang Wang
46
78
0
22 Apr 2024
LoRAP: Transformer Sub-Layers Deserve Differentiated Structured
  Compression for Large Language Models
LoRAP: Transformer Sub-Layers Deserve Differentiated Structured Compression for Large Language Models
Guangyan Li
Yongqiang Tang
Wensheng Zhang
41
5
0
15 Apr 2024
Exploring and Improving Drafts in Blockwise Parallel Decoding
Exploring and Improving Drafts in Blockwise Parallel Decoding
Taehyeon Kim
A. Suresh
Kishore Papineni
Michael Riley
Sanjiv Kumar
Adrian Benton
AI4TS
47
2
0
14 Apr 2024
CATS: Contextually-Aware Thresholding for Sparsity in Large Language
  Models
CATS: Contextually-Aware Thresholding for Sparsity in Large Language Models
Je-Yong Lee
Donghyun Lee
Genghan Zhang
Mo Tiwari
Azalia Mirhoseini
33
11
0
12 Apr 2024
MULTIFLOW: Shifting Towards Task-Agnostic Vision-Language Pruning
MULTIFLOW: Shifting Towards Task-Agnostic Vision-Language Pruning
Matteo Farina
Massimiliano Mancini
Elia Cunegatti
Gaowen Liu
Giovanni Iacca
Elisa Ricci
VLM
23
2
0
08 Apr 2024
LayerNorm: A key component in parameter-efficient fine-tuning
LayerNorm: A key component in parameter-efficient fine-tuning
Taha ValizadehAslani
Hualou Liang
28
1
0
29 Mar 2024
Separate, Dynamic and Differentiable (SMART) Pruner for Block/Output
  Channel Pruning on Computer Vision Tasks
Separate, Dynamic and Differentiable (SMART) Pruner for Block/Output Channel Pruning on Computer Vision Tasks
Guanhua Ding
Zexi Ye
Zhen Zhong
Gang Li
David Shao
34
0
0
29 Mar 2024
Decoding Compressed Trust: Scrutinizing the Trustworthiness of Efficient
  LLMs Under Compression
Decoding Compressed Trust: Scrutinizing the Trustworthiness of Efficient LLMs Under Compression
Junyuan Hong
Jinhao Duan
Chenhui Zhang
Zhangheng Li
Chulin Xie
...
B. Kailkhura
Dan Hendrycks
Dawn Song
Zhangyang Wang
Bo-wen Li
34
24
0
18 Mar 2024
Neural Erosion: Emulating Controlled Neurodegeneration and Aging in AI
  Systems
Neural Erosion: Emulating Controlled Neurodegeneration and Aging in AI Systems
Antonios Alexos
Yu-Dai Tsai
Ian Domingo
Maryam Pishgar
Pierre Baldi
19
0
0
15 Mar 2024
CLLMs: Consistency Large Language Models
CLLMs: Consistency Large Language Models
Siqi Kou
Lanxiang Hu
Zhe He
Zhijie Deng
Hao Zhang
39
26
0
28 Feb 2024
SparseLLM: Towards Global Pruning for Pre-trained Language Models
SparseLLM: Towards Global Pruning for Pre-trained Language Models
Guangji Bai
Yijiang Li
Chen Ling
Kibaek Kim
Liang Zhao
16
6
0
28 Feb 2024
LLM Inference Unveiled: Survey and Roofline Model Insights
LLM Inference Unveiled: Survey and Roofline Model Insights
Zhihang Yuan
Yuzhang Shang
Yang Zhou
Zhen Dong
Zhe Zhou
...
Yong Jae Lee
Yan Yan
Beidi Chen
Guangyu Sun
Kurt Keutzer
37
77
0
26 Feb 2024
Data-free Weight Compress and Denoise for Large Language Models
Data-free Weight Compress and Denoise for Large Language Models
Runyu Peng
Yunhua Zhou
Qipeng Guo
Yang Gao
Hang Yan
Xipeng Qiu
Dahua Lin
26
1
0
26 Feb 2024
Previous
123456
Next