ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2301.00774
  4. Cited By
SparseGPT: Massive Language Models Can Be Accurately Pruned in One-Shot
v1v2v3 (latest)

SparseGPT: Massive Language Models Can Be Accurately Pruned in One-Shot

International Conference on Machine Learning (ICML), 2023
2 January 2023
Elias Frantar
Dan Alistarh
    VLM
ArXiv (abs)PDFHTMLHuggingFace (3 upvotes)Github (799★)

Papers citing "SparseGPT: Massive Language Models Can Be Accurately Pruned in One-Shot"

50 / 665 papers shown
FinerCut: Finer-grained Interpretable Layer Pruning for Large Language
  Models
FinerCut: Finer-grained Interpretable Layer Pruning for Large Language Models
Yang Zhang
Yawei Li
Xinpeng Wang
Qianli Shen
Barbara Plank
Bernd Bischl
Mina Rezaei
Kenji Kawaguchi
249
20
0
28 May 2024
Exploring Activation Patterns of Parameters in Language Models
Exploring Activation Patterns of Parameters in Language Models
Yudong Wang
Damai Dai
Zhifang Sui
170
5
0
28 May 2024
CLAQ: Pushing the Limits of Low-Bit Post-Training Quantization for LLMs
CLAQ: Pushing the Limits of Low-Bit Post-Training Quantization for LLMs
Haoyu Wang
Bei Liu
Hang Shao
Bo Xiao
Ke Zeng
Guanglu Wan
Yanmin Qian
MQ
206
2
0
27 May 2024
NV-Embed: Improved Techniques for Training LLMs as Generalist Embedding Models
NV-Embed: Improved Techniques for Training LLMs as Generalist Embedding Models
Chankyu Lee
Rajarshi Roy
Mengyao Xu
Jonathan Raiman
Mohammad Shoeybi
Bryan Catanzaro
Ming-Yu Liu
RALM
729
393
0
27 May 2024
A Provably Effective Method for Pruning Experts in Fine-tuned Sparse
  Mixture-of-Experts
A Provably Effective Method for Pruning Experts in Fine-tuned Sparse Mixture-of-Experts
Mohammed Nowaz Rabbani Chowdhury
Meng Wang
Kaoutar El Maghraoui
Naigang Wang
Pin-Yu Chen
Christopher Carothers
MoE
409
10
0
26 May 2024
Subspace Node Pruning
Subspace Node Pruning
Joshua Offergeld
Marcel van Gerven
Nasir Ahmad
360
1
0
26 May 2024
SPP: Sparsity-Preserved Parameter-Efficient Fine-Tuning for Large
  Language Models
SPP: Sparsity-Preserved Parameter-Efficient Fine-Tuning for Large Language Models
Xudong Lu
Aojun Zhou
Yuhui Xu
Renrui Zhang
Shiyang Feng
Jiaming Song
225
10
0
25 May 2024
Large Language Model Pruning
Large Language Model Pruning
Hanjuan Huang
Hao-Jia Song
H. Pao
416
1
0
24 May 2024
Sparse maximal update parameterization: A holistic approach to sparse
  training dynamics
Sparse maximal update parameterization: A holistic approach to sparse training dynamics
Nolan Dey
Shane Bergsma
Joel Hestness
257
8
0
24 May 2024
Distributed Speculative Inference (DSI): Speculation Parallelism for Provably Faster Lossless Language Model Inference
Distributed Speculative Inference (DSI): Speculation Parallelism for Provably Faster Lossless Language Model InferenceInternational Conference on Learning Representations (ICLR), 2024
Nadav Timor
Jonathan Mamou
Daniel Korat
Moshe Berchansky
Oren Pereg
Moshe Wasserblat
Tomer Galanti
Michal Gordon
David Harel
LRM
259
1
0
23 May 2024
SliM-LLM: Salience-Driven Mixed-Precision Quantization for Large Language Models
SliM-LLM: Salience-Driven Mixed-Precision Quantization for Large Language Models
Wei Huang
Haotong Qin
Yangdong Liu
Yawei Li
Qinshuo Liu
Xianglong Liu
Luca Benini
Michele Magno
Shiming Zhang
Xiaojuan Qi
MQ
359
35
0
23 May 2024
Your Transformer is Secretly Linear
Your Transformer is Secretly Linear
Anton Razzhigaev
Matvey Mikhalchuk
Elizaveta Goncharova
Nikolai Gerasimenko
Ivan Oseledets
Denis Dimitrov
Andrey Kuznetsov
199
13
0
19 May 2024
Dynamic Activation Pitfalls in LLaMA Models: An Empirical Study
Dynamic Activation Pitfalls in LLaMA Models: An Empirical Study
Chi Ma
Mincong Huang
Chao Wang
Yujie Wang
Lei Yu
203
3
0
15 May 2024
Characterizing the Accuracy - Efficiency Trade-off of Low-rank
  Decomposition in Language Models
Characterizing the Accuracy - Efficiency Trade-off of Low-rank Decomposition in Language Models
Chakshu Moar
Michael Pellauer
Hyoukjun Kwon
140
5
0
10 May 2024
Pruning as a Domain-specific LLM Extractor
Pruning as a Domain-specific LLM Extractor
Nan Zhang
Yanchi Liu
Xujiang Zhao
Wei Cheng
Runxue Bao
Rui Zhang
Prasenjit Mitra
Haifeng Chen
177
19
0
10 May 2024
OpenBA-V2: Reaching 77.3% High Compression Ratio with Fast Multi-Stage
  Pruning
OpenBA-V2: Reaching 77.3% High Compression Ratio with Fast Multi-Stage Pruning
Dan Qiao
Yi Su
Pinzheng Wang
Jing Ye
Wen Xie
...
Wenliang Chen
Guohong Fu
Guodong Zhou
Qiaoming Zhu
Min Zhang
MQ
223
1
0
09 May 2024
FlexEControl: Flexible and Efficient Multimodal Control for
  Text-to-Image Generation
FlexEControl: Flexible and Efficient Multimodal Control for Text-to-Image Generation
Xuehai He
Jian Zheng
Jacob Zhiyuan Fang
Robinson Piramuthu
Mohit Bansal
Vicente Ordonez
Gunnar Sigurdsson
Nanyun Peng
Xin Eric Wang
DiffM
237
4
0
08 May 2024
Collage: Light-Weight Low-Precision Strategy for LLM Training
Collage: Light-Weight Low-Precision Strategy for LLM TrainingInternational Conference on Machine Learning (ICML), 2024
Tao Yu
Gaurav Gupta
Karthick Gopalswamy
Amith R. Mamidala
Hao Zhou
Jeffrey Huynh
Youngsuk Park
Ron Diamant
Hao Ding
Jun Huan
MQ
246
7
0
06 May 2024
Enabling High-Sparsity Foundational Llama Models with Efficient
  Pretraining and Deployment
Enabling High-Sparsity Foundational Llama Models with Efficient Pretraining and Deployment
Abhinav Agarwalla
Abhay Gupta
Alexandre Marques
Shubhra Pandit
Michael Goin
...
Tuan Nguyen
Mahmoud Salem
Dan Alistarh
Sean Lie
Mark Kurtz
MoESyDa
269
17
0
06 May 2024
Dependency-Aware Semi-Structured Sparsity: Declining Roles of Outliers
  in Pruning GLU-based LLMs
Dependency-Aware Semi-Structured Sparsity: Declining Roles of Outliers in Pruning GLU-based LLMs
Zhiyu Guo
Hidetaka Kamigaito
Taro Wanatnabe
132
2
0
03 May 2024
COPAL: Continual Pruning in Large Language Generative Models
COPAL: Continual Pruning in Large Language Generative ModelsInternational Conference on Machine Learning (ICML), 2024
Srikanth Malla
Joon Hee Choi
Chiho Choi
VLMCLL
195
6
0
02 May 2024
Mélange: Cost Efficient Large Language Model Serving by Exploiting GPU
  Heterogeneity
Mélange: Cost Efficient Large Language Model Serving by Exploiting GPU Heterogeneity
Tyler Griggs
Xiaoxuan Liu
Jiaxiang Yu
Doyoung Kim
Wei-Lin Chiang
Alvin Cheung
Ion Stoica
336
27
0
22 Apr 2024
A Survey on Efficient Inference for Large Language Models
A Survey on Efficient Inference for Large Language Models
Zixuan Zhou
Xuefei Ning
Ke Hong
Tianyu Fu
Jiaming Xu
...
Shengen Yan
Guohao Dai
Xiao-Ping Zhang
Yuhan Dong
Yu Wang
420
174
0
22 Apr 2024
MixLoRA: Enhancing Large Language Models Fine-Tuning with LoRA-based
  Mixture of Experts
MixLoRA: Enhancing Large Language Models Fine-Tuning with LoRA-based Mixture of Experts
Dengchun Li
Yingzi Ma
Naizheng Wang
Zhengmao Ye
Zhiyuan Cheng
...
Yan Zhang
Lei Duan
Jie Zuo
Cal Yang
Mingjie Tang
MoE
371
108
0
22 Apr 2024
Parallel Decoding via Hidden Transfer for Lossless Large Language Model
  Acceleration
Parallel Decoding via Hidden Transfer for Lossless Large Language Model Acceleration
Pengfei Wu
Jiahao Liu
Zhuocheng Gong
Qifan Wang
Jinpeng Li
Jingang Wang
Xunliang Cai
Dongyan Zhao
188
3
0
18 Apr 2024
Shears: Unstructured Sparsity with Neural Low-rank Adapter Search
Shears: Unstructured Sparsity with Neural Low-rank Adapter Search
J. P. Muñoz
Jinjie Yuan
Nilesh Jain
223
9
0
16 Apr 2024
SparseDM: Toward Sparse Efficient Diffusion Models
SparseDM: Toward Sparse Efficient Diffusion Models
Kafeng Wang
Jianfei Chen
He Li
Zhenpeng Mi
Jun-Jie Zhu
DiffM
387
13
0
16 Apr 2024
Language Model Cascades: Token-level uncertainty and beyond
Language Model Cascades: Token-level uncertainty and beyond
Neha Gupta
Harikrishna Narasimhan
Wittawat Jitkrittum
A. S. Rawat
A. Menon
Sanjiv Kumar
UQLM
464
90
0
15 Apr 2024
LoRAP: Transformer Sub-Layers Deserve Differentiated Structured
  Compression for Large Language Models
LoRAP: Transformer Sub-Layers Deserve Differentiated Structured Compression for Large Language Models
Guangyan Li
Yongqiang Tang
Wensheng Zhang
247
8
0
15 Apr 2024
CATS: Contextually-Aware Thresholding for Sparsity in Large Language
  Models
CATS: Contextually-Aware Thresholding for Sparsity in Large Language Models
Je-Yong Lee
Donghyun Lee
Genghan Zhang
Mo Tiwari
Azalia Mirhoseini
275
30
0
12 Apr 2024
CQIL: Inference Latency Optimization with Concurrent Computation of
  Quasi-Independent Layers
CQIL: Inference Latency Optimization with Concurrent Computation of Quasi-Independent LayersAnnual Meeting of the Association for Computational Linguistics (ACL), 2024
Longwei Zou
Qingyang Wang
Han Zhao
Tingfeng Liu
Yi Yang
Yangdong Deng
248
1
0
10 Apr 2024
MULTIFLOW: Shifting Towards Task-Agnostic Vision-Language Pruning
MULTIFLOW: Shifting Towards Task-Agnostic Vision-Language Pruning
Matteo Farina
Goran Frehse
Elia Cunegatti
Gaowen Liu
Giovanni Iacca
Elisa Ricci
VLM
278
8
0
08 Apr 2024
Lightweight Deep Learning for Resource-Constrained Environments: A
  Survey
Lightweight Deep Learning for Resource-Constrained Environments: A Survey
Hou-I Liu
Marco Galindo
Hongxia Xie
Lai-Kuan Wong
Hong-Han Shuai
Yung-Hui Li
Wen-Huang Cheng
366
154
0
08 Apr 2024
DLoRA: Distributed Parameter-Efficient Fine-Tuning Solution for Large
  Language Model
DLoRA: Distributed Parameter-Efficient Fine-Tuning Solution for Large Language Model
Chao Gao
Sai Qian Zhang
ALM
374
9
0
08 Apr 2024
Multilingual Brain Surgeon: Large Language Models Can be Compressed Leaving No Language Behind
Multilingual Brain Surgeon: Large Language Models Can be Compressed Leaving No Language Behind
Hongchuan Zeng
Hongshen Xu
Lu Chen
Kai Yu
321
10
0
06 Apr 2024
FFN-SkipLLM: A Hidden Gem for Autoregressive Decoding with Adaptive Feed
  Forward Skipping
FFN-SkipLLM: A Hidden Gem for Autoregressive Decoding with Adaptive Feed Forward Skipping
Ajay Jaiswal
Bodun Hu
Lu Yin
Yeonju Ro
Shiwei Liu
Tianlong Chen
Aditya Akella
294
22
0
05 Apr 2024
Rethinking Pruning for Vision-Language Models: Strategies for Effective
  Sparsity and Performance Restoration
Rethinking Pruning for Vision-Language Models: Strategies for Effective Sparsity and Performance Restoration
Shwai He
Ang Li
Tianlong Chen
VLM
285
3
0
03 Apr 2024
Transformer-Lite: High-efficiency Deployment of Large Language Models on
  Mobile Phone GPUs
Transformer-Lite: High-efficiency Deployment of Large Language Models on Mobile Phone GPUs
Luchang Li
Sheng Qian
Jie Lu
Lunxi Yuan
Rui Wang
Qin Xie
292
19
0
29 Mar 2024
LC-LLM: Explainable Lane-Change Intention and Trajectory Predictions
  with Large Language Models
LC-LLM: Explainable Lane-Change Intention and Trajectory Predictions with Large Language Models
Mingxing Peng
Xusen Guo
Xianda Chen
Meixin Zhu
Kehua Chen
Hao
Hao Yang
Xuesong Wang
Yinhai Wang
LRM
356
52
0
27 Mar 2024
AI and Memory Wall
AI and Memory Wall
A. Gholami
Z. Yao
Sehoon Kim
Coleman Hooper
Michael W. Mahoney
Kurt Keutzer
252
262
0
21 Mar 2024
Pruning for Improved ADC Efficiency in Crossbar-based Analog In-memory
  Accelerators
Pruning for Improved ADC Efficiency in Crossbar-based Analog In-memory Accelerators
Timur Ibrayev
Isha Garg
I. Chakraborty
Kaushik Roy
126
0
0
19 Mar 2024
Toward Sustainable GenAI using Generation Directives for Carbon-Friendly
  Large Language Model Inference
Toward Sustainable GenAI using Generation Directives for Carbon-Friendly Large Language Model Inference
Baolin Li
Yankai Jiang
V. Gadepally
Devesh Tiwari
244
23
0
19 Mar 2024
MELTing point: Mobile Evaluation of Language Transformers
MELTing point: Mobile Evaluation of Language Transformers
Stefanos Laskaridis
Kleomenis Katevas
Lorenzo Minto
Hamed Haddadi
301
34
0
19 Mar 2024
Let's Focus on Neuron: Neuron-Level Supervised Fine-tuning for Large
  Language Model
Let's Focus on Neuron: Neuron-Level Supervised Fine-tuning for Large Language ModelInternational Conference on Computational Linguistics (COLING), 2024
Haoyun Xu
Runzhe Zhan
Yang Li
Lidia S. Chao
220
4
0
18 Mar 2024
Decoding Compressed Trust: Scrutinizing the Trustworthiness of Efficient
  LLMs Under Compression
Decoding Compressed Trust: Scrutinizing the Trustworthiness of Efficient LLMs Under CompressionInternational Conference on Machine Learning (ICML), 2024
Junyuan Hong
Jinhao Duan
Chenhui Zhang
Zhangheng Li
Chulin Xie
...
B. Kailkhura
Dan Hendrycks
Dawn Song
Zinan Lin
Yue Liu
337
47
0
18 Mar 2024
Toward Adaptive Large Language Models Structured Pruning via Hybrid-grained Weight Importance Assessment
Toward Adaptive Large Language Models Structured Pruning via Hybrid-grained Weight Importance Assessment
Jun Liu
Chao Wu
Changdi Yang
Hao Tang
Zhenglun Kong
...
Wei Niu
Yanzhi Wang
Xue Lin
Dong Huang
Yanzhi Wang
203
0
0
16 Mar 2024
FlexNN: A Dataflow-aware Flexible Deep Learning Accelerator for
  Energy-Efficient Edge Devices
FlexNN: A Dataflow-aware Flexible Deep Learning Accelerator for Energy-Efficient Edge Devices
Arnab Raha
Deepak A. Mathaikutty
Soumendu Kumar Ghosh
Shamik Kundu
140
13
0
14 Mar 2024
SVD-LLM: Truncation-aware Singular Value Decomposition for Large Language Model Compression
SVD-LLM: Truncation-aware Singular Value Decomposition for Large Language Model CompressionInternational Conference on Learning Representations (ICLR), 2024
Xin Wang
Yu Zheng
Zhongwei Wan
Mi Zhang
MQ
521
152
0
12 Mar 2024
ShortGPT: Layers in Large Language Models are More Redundant Than You
  Expect
ShortGPT: Layers in Large Language Models are More Redundant Than You Expect
Xin Men
Mingyu Xu
Qingyu Zhang
Bingning Wang
Hongyu Lin
Yaojie Lu
Xianpei Han
Weipeng Chen
365
244
0
06 Mar 2024
DPPA: Pruning Method for Large Language Model to Model Merging
DPPA: Pruning Method for Large Language Model to Model Merging
Yaochen Zhu
Rui Xia
Jiajun Zhang
MoMe
195
4
0
05 Mar 2024
Previous
123...10111213149
Next