ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2301.00774
  4. Cited By
SparseGPT: Massive Language Models Can Be Accurately Pruned in One-Shot
v1v2v3 (latest)

SparseGPT: Massive Language Models Can Be Accurately Pruned in One-Shot

International Conference on Machine Learning (ICML), 2023
2 January 2023
Elias Frantar
Dan Alistarh
    VLM
ArXiv (abs)PDFHTMLHuggingFace (3 upvotes)Github (799★)

Papers citing "SparseGPT: Massive Language Models Can Be Accurately Pruned in One-Shot"

50 / 664 papers shown
Title
Dynamic Sparse No Training: Training-Free Fine-tuning for Sparse LLMs
Dynamic Sparse No Training: Training-Free Fine-tuning for Sparse LLMsInternational Conference on Learning Representations (ICLR), 2023
Yuxin Zhang
Lirui Zhao
Mingbao Lin
Yunyun Sun
Yiwu Yao
Xingjia Han
Jared Tanner
Shiwei Liu
Rongrong Ji
SyDa
281
65
0
13 Oct 2023
Sparse Fine-tuning for Inference Acceleration of Large Language Models
Sparse Fine-tuning for Inference Acceleration of Large Language Models
Eldar Kurtic
Denis Kuznedelev
Elias Frantar
Michael Goin
Dan Alistarh
154
16
0
10 Oct 2023
Sheared LLaMA: Accelerating Language Model Pre-training via Structured
  Pruning
Sheared LLaMA: Accelerating Language Model Pre-training via Structured PruningInternational Conference on Learning Representations (ICLR), 2023
Mengzhou Xia
Tianyu Gao
Zhiyuan Zeng
Danqi Chen
377
402
0
10 Oct 2023
LLMLingua: Compressing Prompts for Accelerated Inference of Large
  Language Models
LLMLingua: Compressing Prompts for Accelerated Inference of Large Language ModelsConference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Huiqiang Jiang
Qianhui Wu
Chin-Yew Lin
Yuqing Yang
Lili Qiu
374
175
0
09 Oct 2023
Compresso: Structured Pruning with Collaborative Prompting Learns
  Compact Large Language Models
Compresso: Structured Pruning with Collaborative Prompting Learns Compact Large Language Models
Song Guo
Jiahang Xu
Li Zhang
Mao Yang
228
18
0
08 Oct 2023
Outlier Weighed Layerwise Sparsity (OWL): A Missing Secret Sauce for Pruning LLMs to High Sparsity
Outlier Weighed Layerwise Sparsity (OWL): A Missing Secret Sauce for Pruning LLMs to High SparsityInternational Conference on Machine Learning (ICML), 2023
Lu Yin
You Wu
Zhenyu Zhang
Cheng-Yu Hsieh
Yaqing Wang
...
Mykola Pechenizkiy
Yi Liang
Michael Bendersky
Zinan Lin
Shiwei Liu
503
136
0
08 Oct 2023
Dual Grained Quantization: Efficient Fine-Grained Quantization for LLM
Dual Grained Quantization: Efficient Fine-Grained Quantization for LLM
Luoming Zhang
Wen Fei
Weijia Wu
Yefei He
Zhenyu Lou
Hong Zhou
MQ
182
5
0
07 Oct 2023
The Cost of Down-Scaling Language Models: Fact Recall Deteriorates
  before In-Context Learning
The Cost of Down-Scaling Language Models: Fact Recall Deteriorates before In-Context Learning
Tian Jin
Nolan Clement
Xin Dong
Vaishnavh Nagarajan
Michael Carbin
Jonathan Ragan-Kelley
Gintare Karolina Dziugaite
LRM
275
5
0
07 Oct 2023
ReLU Strikes Back: Exploiting Activation Sparsity in Large Language
  Models
ReLU Strikes Back: Exploiting Activation Sparsity in Large Language ModelsInternational Conference on Learning Representations (ICLR), 2023
Iman Mirzadeh
Keivan Alizadeh-Vahid
Sachin Mehta
C. C. D. Mundo
Oncel Tuzel
Golnoosh Samei
Mohammad Rastegari
Mehrdad Farajtabar
462
99
0
06 Oct 2023
SPADE: Sparsity-Guided Debugging for Deep Neural Networks
SPADE: Sparsity-Guided Debugging for Deep Neural NetworksInternational Conference on Machine Learning (ICML), 2023
Arshia Soltani Moakhar
Eugenia Iofinova
Elias Frantar
Dan Alistarh
290
2
0
06 Oct 2023
ECoFLaP: Efficient Coarse-to-Fine Layer-Wise Pruning for Vision-Language
  Models
ECoFLaP: Efficient Coarse-to-Fine Layer-Wise Pruning for Vision-Language ModelsInternational Conference on Learning Representations (ICLR), 2023
Yi-Lin Sung
Jaehong Yoon
Mohit Bansal
VLM
253
19
0
04 Oct 2023
Sweeping Heterogeneity with Smart MoPs: Mixture of Prompts for LLM Task Adaptation
Sweeping Heterogeneity with Smart MoPs: Mixture of Prompts for LLM Task AdaptationAAAI Conference on Artificial Intelligence (AAAI), 2023
Chen Dun
Mirian Hipolito Garcia
Guoqing Zheng
Ahmed Hassan Awadallah
Anastasios Kyrillidis
Robert Sim
366
7
0
04 Oct 2023
VENOM: A Vectorized N:M Format for Unleashing the Power of Sparse Tensor
  Cores
VENOM: A Vectorized N:M Format for Unleashing the Power of Sparse Tensor CoresInternational Conference for High Performance Computing, Networking, Storage and Analysis (SC), 2023
Roberto L. Castro
Andrei Ivanov
Diego Andrade
Tal Ben-Nun
B. Fraguela
Torsten Hoefler
153
30
0
03 Oct 2023
Compressing LLMs: The Truth is Rarely Pure and Never Simple
Compressing LLMs: The Truth is Rarely Pure and Never SimpleInternational Conference on Learning Representations (ICLR), 2023
Ajay Jaiswal
Zhe Gan
Xianzhi Du
Bowen Zhang
Zinan Lin
Yinfei Yang
MQ
254
60
0
02 Oct 2023
Do Compressed LLMs Forget Knowledge? An Experimental Study with
  Practical Implications
Do Compressed LLMs Forget Knowledge? An Experimental Study with Practical Implications
Duc Hoang
Minsik Cho
Thomas Merth
Mohammad Rastegari
Zhangyang Wang
KELMCLL
235
5
0
02 Oct 2023
PB-LLM: Partially Binarized Large Language Models
PB-LLM: Partially Binarized Large Language ModelsInternational Conference on Learning Representations (ICLR), 2023
Yuzhang Shang
Zhihang Yuan
Qiang Wu
Zhen Dong
MQ
326
76
0
29 Sep 2023
Junk DNA Hypothesis: Pruning Small Pre-Trained Weights Irreversibly and Monotonically Impairs "Difficult" Downstream Tasks in LLMs
Junk DNA Hypothesis: Pruning Small Pre-Trained Weights Irreversibly and Monotonically Impairs "Difficult" Downstream Tasks in LLMsInternational Conference on Machine Learning (ICML), 2023
Lu Yin
Ajay Jaiswal
Shiwei Liu
Souvik Kundu
Zinan Lin
341
7
0
29 Sep 2023
QA-LoRA: Quantization-Aware Low-Rank Adaptation of Large Language Models
QA-LoRA: Quantization-Aware Low-Rank Adaptation of Large Language ModelsInternational Conference on Learning Representations (ICLR), 2023
Yuhui Xu
Lingxi Xie
Xiaotao Gu
Xin Chen
Heng Chang
Hengheng Zhang
Zhensu Chen
Xiaopeng Zhang
Qi Tian
MQ
194
148
0
26 Sep 2023
LORD: Low Rank Decomposition Of Monolingual Code LLMs For One-Shot
  Compression
LORD: Low Rank Decomposition Of Monolingual Code LLMs For One-Shot Compression
Ayush Kaushal
Tejas Vaidhya
Irina Rish
307
24
0
25 Sep 2023
Flash-LLM: Enabling Cost-Effective and Highly-Efficient Large Generative
  Model Inference with Unstructured Sparsity
Flash-LLM: Enabling Cost-Effective and Highly-Efficient Large Generative Model Inference with Unstructured SparsityProceedings of the VLDB Endowment (PVLDB), 2023
Haojun Xia
Zhen Zheng
Yuchao Li
Donglin Zhuang
Zhongzhu Zhou
Xiafei Qiu
Yong Li
Wei Lin
Shuaiwen Leon Song
154
21
0
19 Sep 2023
Pruning Large Language Models via Accuracy Predictor
Pruning Large Language Models via Accuracy Predictor
Yupeng Ji
Yibo Cao
Jiu-si Liu
KELM
177
4
0
18 Sep 2023
Scaling Laws for Sparsely-Connected Foundation Models
Scaling Laws for Sparsely-Connected Foundation ModelsInternational Conference on Learning Representations (ICLR), 2023
Elias Frantar
C. Riquelme
N. Houlsby
Dan Alistarh
Utku Evci
231
46
0
15 Sep 2023
Norm Tweaking: High-performance Low-bit Quantization of Large Language
  Models
Norm Tweaking: High-performance Low-bit Quantization of Large Language ModelsAAAI Conference on Artificial Intelligence (AAAI), 2023
Liang Li
Qingyuan Li
Bo Zhang
Xiangxiang Chu
MQ
261
39
0
06 Sep 2023
FPTQ: Fine-grained Post-Training Quantization for Large Language Models
FPTQ: Fine-grained Post-Training Quantization for Large Language Models
Qingyuan Li
Yifan Zhang
Liang Li
Peng Yao
Bo Zhang
Xiangxiang Chu
Yerui Sun
Li-Qiang Du
Yuchen Xie
MQ
222
18
0
30 Aug 2023
EdgeMoE: Empowering Sparse Large Language Models on Mobile Devices
EdgeMoE: Empowering Sparse Large Language Models on Mobile DevicesIEEE Transactions on Mobile Computing (IEEE TMC), 2023
Rongjie Yi
Liwei Guo
Shiyun Wei
Ao Zhou
Shangguang Wang
Mengwei Xu
MoE
129
24
0
28 Aug 2023
Ternary Singular Value Decomposition as a Better Parameterized Form in
  Linear Mapping
Ternary Singular Value Decomposition as a Better Parameterized Form in Linear Mapping
Boyu Chen
Hanxuan Chen
Jiao He
Fengyu Sun
Shangling Jui
156
3
0
15 Aug 2023
A Survey on Model Compression for Large Language Models
A Survey on Model Compression for Large Language ModelsTransactions of the Association for Computational Linguistics (TACL), 2023
Xunyu Zhu
Jian Li
Yong Liu
Can Ma
Weiping Wang
306
345
0
15 Aug 2023
SLoRA: Federated Parameter Efficient Fine-Tuning of Language Models
SLoRA: Federated Parameter Efficient Fine-Tuning of Language Models
Sara Babakniya
A. Elkordy
Yahya H. Ezzeldin
Qingfeng Liu
Kee-Bong Song
Mostafa El-Khamy
Salman Avestimehr
160
105
0
12 Aug 2023
Sci-CoT: Leveraging Large Language Models for Enhanced Knowledge
  Distillation in Small Models for Scientific QA
Sci-CoT: Leveraging Large Language Models for Enhanced Knowledge Distillation in Small Models for Scientific QAInternational Conference on Innovative Computing and Cloud Computing (ICCC), 2023
Yuhan Ma
Haiqi Jiang
Chenyou Fan
LRM
156
17
0
09 Aug 2023
Accurate Retraining-free Pruning for Pretrained Encoder-based Language
  Models
Accurate Retraining-free Pruning for Pretrained Encoder-based Language ModelsInternational Conference on Learning Representations (ICLR), 2023
Seungcheol Park
Ho-Jin Choi
U. Kang
VLM
182
12
0
07 Aug 2023
A Survey of Techniques for Optimizing Transformer Inference
A Survey of Techniques for Optimizing Transformer InferenceJournal of systems architecture (JSA), 2023
Krishna Teja Chitty-Venkata
Sparsh Mittal
M. Emani
V. Vishwanath
Arun Somani
235
116
0
16 Jul 2023
Pruning vs Quantization: Which is Better?
Pruning vs Quantization: Which is Better?Neural Information Processing Systems (NeurIPS), 2023
Andrey Kuzmin
Markus Nagel
M. V. Baalen
Arash Behboodi
Tijmen Blankevoort
MQ
288
99
0
06 Jul 2023
Query Understanding in the Age of Large Language Models
Query Understanding in the Age of Large Language Models
Avishek Anand
Venktesh V
Abhijit Anand
Vinay Setty
LRM
239
10
0
28 Jun 2023
H$_2$O: Heavy-Hitter Oracle for Efficient Generative Inference of Large
  Language Models
H2_22​O: Heavy-Hitter Oracle for Efficient Generative Inference of Large Language ModelsNeural Information Processing Systems (NeurIPS), 2023
Zhenyu Zhang
Ying Sheng
Wanrong Zhu
Tianlong Chen
Lianmin Zheng
...
Yuandong Tian
Christopher Ré
Clark W. Barrett
Zinan Lin
Beidi Chen
VLM
683
463
0
24 Jun 2023
A Simple and Effective Pruning Approach for Large Language Models
A Simple and Effective Pruning Approach for Large Language ModelsInternational Conference on Learning Representations (ICLR), 2023
Mingjie Sun
Zhuang Liu
Anna Bair
J. Zico Kolter
461
637
0
20 Jun 2023
ModuleFormer: Modularity Emerges from Mixture-of-Experts
ModuleFormer: Modularity Emerges from Mixture-of-Experts
Songlin Yang
Zheyu Zhang
Tianyou Cao
Shawn Tan
Zhenfang Chen
Chuang Gan
KELMMoE
171
13
0
07 Jun 2023
The Emergence of Essential Sparsity in Large Pre-trained Models: The
  Weights that Matter
The Emergence of Essential Sparsity in Large Pre-trained Models: The Weights that MatterNeural Information Processing Systems (NeurIPS), 2023
Ajay Jaiswal
Shiwei Liu
Tianlong Chen
Zinan Lin
VLM
239
44
0
06 Jun 2023
SpQR: A Sparse-Quantized Representation for Near-Lossless LLM Weight
  Compression
SpQR: A Sparse-Quantized Representation for Near-Lossless LLM Weight CompressionInternational Conference on Learning Representations (ICLR), 2023
Tim Dettmers
Ruslan Svirschevski
Vage Egiazarian
Denis Kuznedelev
Elias Frantar
Saleh Ashkboos
Alexander Borzunov
Torsten Hoefler
Dan Alistarh
MQ
165
326
0
05 Jun 2023
Intriguing Properties of Quantization at Scale
Intriguing Properties of Quantization at ScaleNeural Information Processing Systems (NeurIPS), 2023
Arash Ahmadian
Saurabh Dash
Hongyu Chen
Bharat Venkitesh
Stephen Gou
Phil Blunsom
Ahmet Üstün
Sara Hooker
MQ
229
43
0
30 May 2023
Scissorhands: Exploiting the Persistence of Importance Hypothesis for
  LLM KV Cache Compression at Test Time
Scissorhands: Exploiting the Persistence of Importance Hypothesis for LLM KV Cache Compression at Test TimeNeural Information Processing Systems (NeurIPS), 2023
Zichang Liu
Aditya Desai
Fangshuo Liao
Weitao Wang
Victor Xie
Zhaozhuo Xu
Anastasios Kyrillidis
Anshumali Shrivastava
307
306
0
26 May 2023
Dynamic Context Pruning for Efficient and Interpretable Autoregressive
  Transformers
Dynamic Context Pruning for Efficient and Interpretable Autoregressive TransformersNeural Information Processing Systems (NeurIPS), 2023
Sotiris Anagnostidis
Dario Pavllo
Luca Biggio
Lorenzo Noci
Aurelien Lucchi
Thomas Hofmann
338
68
0
25 May 2023
Winner-Take-All Column Row Sampling for Memory Efficient Adaptation of
  Language Model
Winner-Take-All Column Row Sampling for Memory Efficient Adaptation of Language ModelNeural Information Processing Systems (NeurIPS), 2023
Zirui Liu
Guanchu Wang
Shaochen Zhong
Zhaozhuo Xu
Daochen Zha
...
Zhimeng Jiang
Kaixiong Zhou
Vipin Chaudhary
Shuai Xu
Helen Zhou
242
21
0
24 May 2023
Just CHOP: Embarrassingly Simple LLM Compression
Just CHOP: Embarrassingly Simple LLM Compression
A. Jha
Tom Sherborne
Evan Pete Walsh
Dirk Groeneveld
Emma Strubell
Iz Beltagy
209
4
0
24 May 2023
LLM-Pruner: On the Structural Pruning of Large Language Models
LLM-Pruner: On the Structural Pruning of Large Language ModelsNeural Information Processing Systems (NeurIPS), 2023
Xinyin Ma
Gongfan Fang
Xinchao Wang
605
642
0
19 May 2023
Compress, Then Prompt: Improving Accuracy-Efficiency Trade-off of LLM
  Inference with Transferable Prompt
Compress, Then Prompt: Improving Accuracy-Efficiency Trade-off of LLM Inference with Transferable Prompt
Zhaozhuo Xu
Zirui Liu
Beidi Chen
Yuxin Tang
Jue Wang
Kaixiong Zhou
Helen Zhou
Anshumali Shrivastava
MQ
238
39
0
17 May 2023
SpecInfer: Accelerating Generative Large Language Model Serving with
  Tree-based Speculative Inference and Verification
SpecInfer: Accelerating Generative Large Language Model Serving with Tree-based Speculative Inference and VerificationInternational Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), 2023
Xupeng Miao
Xupeng Miao
Zhihao Zhang
Xinhao Cheng
Zeyu Wang
...
Chunan Shi
Zhuoming Chen
Daiyaan Arfeen
Reyna Abhyankar
Zhihao Jia
LRM
414
247
0
16 May 2023
CrAFT: Compression-Aware Fine-Tuning for Efficient Visual Task
  Adaptation
CrAFT: Compression-Aware Fine-Tuning for Efficient Visual Task Adaptation
J. Heo
S. Azizi
A. Fayyazi
Massoud Pedram
205
1
0
08 May 2023
Towards Automated Circuit Discovery for Mechanistic Interpretability
Towards Automated Circuit Discovery for Mechanistic InterpretabilityNeural Information Processing Systems (NeurIPS), 2023
Arthur Conmy
Augustine N. Mavor-Parker
Aengus Lynch
Stefan Heimersheim
Adrià Garriga-Alonso
493
437
0
28 Apr 2023
Sparsified Model Zoo Twins: Investigating Populations of Sparsified
  Neural Network Models
Sparsified Model Zoo Twins: Investigating Populations of Sparsified Neural Network Models
D. Honegger
Konstantin Schurholt
Damian Borth
236
5
0
26 Apr 2023
Towards Compute-Optimal Transfer Learning
Towards Compute-Optimal Transfer Learning
Massimo Caccia
Alexandre Galashov
Arthur Douillard
Amal Rannen-Triki
Dushyant Rao
Michela Paganini
Laurent Charlin
MarcÁurelio Ranzato
Razvan Pascanu
137
3
0
25 Apr 2023
Previous
123...121314
Next