v1v2v3 (latest)

SparseGPT: Massive Language Models Can Be Accurately Pruned in One-Shot

International Conference on Machine Learning (ICML), 2023

2 January 2023

Elias Frantar

Dan Alistarh

VLM

ArXiv (abs)PDF HTML HuggingFace (3 upvotes)Github (799★)

Papers citing "SparseGPT: Massive Language Models Can Be Accurately Pruned in One-Shot"

50 / 665 papers shown

Compressing Large Language Models with Automated Sub-Network Search

R. Sukthanker

B. Staffler

Katharina Eggensperger

Aaron Klein

LRM

321

09 Oct 2024

Functional-level Uncertainty Quantification for Calibrated Fine-tuning on LLMs

513

09 Oct 2024

A Survey: Collaborative Hardware and Software Design in the Era of Large Language ModelsIEEE Circuits and Systems Magazine (IEEE CSM), 2024

...

Yiran Chen

226

08 Oct 2024

Treat Visual Tokens as Text? But Your MLLM Only Needs Fewer Efforts to See

Chenliang Xu

253

08 Oct 2024

Mixture Compressor for Mixture-of-Experts LLMs Gains MoreInternational Conference on Learning Representations (ICLR), 2024

Wei Huang

Yue Liao

Jianhui Liu

Ruifei He

Haoru Tan

Shiming Zhang

Hongsheng Li

Si Liu

Xiaojuan Qi

MoE

298

08 Oct 2024

ESPACE: Dimensionality Reduction of Activations for Model CompressionNeural Information Processing Systems (NeurIPS), 2024

Charbel Sakr

Brucek Khailany

260

07 Oct 2024

Large Language Model Inference Acceleration: A Comprehensive Hardware Perspective

...

633

06 Oct 2024

ARB-LLM: Alternating Refined Binarizations for Large Language ModelsInternational Conference on Learning Representations (ICLR), 2024

Jiang Tian

Linghe Kong

Yulun Zhang

Yunbo Wang

323

04 Oct 2024

Basis Sharing: Cross-Layer Parameter Sharing for Large Language Model CompressionInternational Conference on Learning Representations (ICLR), 2024

Yu-Guang Chen

210

02 Oct 2024

Getting Free Bits Back from Rotational Symmetries in LLMs

Wenlin Chen

Gergely Flamich

José Miguel Hernández-Lobato

202

02 Oct 2024

Exploring Gen-AI applications in building research and industry: A reviewBuilding Simulation (BS), 2024

319

01 Oct 2024

Aggressive Post-Training Compression on Extremely Large Language Models

Zining Zhang

Yao Chen

Bingsheng He

Zhenjie Zhang

30 Sep 2024

EEG Emotion Copilot: Optimizing Lightweight LLMs for Emotional EEG Interpretation with Assisted Medical Record GenerationNeural Networks (NN), 2024

...

339

30 Sep 2024

Efficient Arbitrary Precision Acceleration for Large Language Models on GPU Tensor CoresAsia and South Pacific Design Automation Conference (ASP-DAC), 2024

318

26 Sep 2024

MaskLLM: Learnable Semi-Structured Sparsity for Large Language ModelsNeural Information Processing Systems (NeurIPS), 2024

Hongxu Yin

Jan Kautz

Xinchao Wang

174

26 Sep 2024

Pruning Multilingual Large Language Models for Multilingual InferenceConference on Empirical Methods in Natural Language Processing (EMNLP), 2024

418

25 Sep 2024

Demystifying Issues, Causes and Solutions in LLM Open-Source ProjectsJournal of Systems and Software (JSS), 2024

299

25 Sep 2024

Enhancing Aspect-based Sentiment Analysis in Tourism Using Large Language Models and Positional Information

219

23 Sep 2024

CFSP: An Efficient Structured Pruning Framework for LLMs with Coarse-to-Fine Activation InformationInternational Conference on Computational Linguistics (COLING), 2024

Yuxin Wang

Zekun Wang

Qing Yang

Ming Liu

Bing Qin

178

20 Sep 2024

OATS: Outlier-Aware Pruning Through Sparse and Low Rank DecompositionInternational Conference on Learning Representations (ICLR), 2024

Stephen Zhang

Vardan Papyan

VLM

556

20 Sep 2024

Evaluating the Impact of Compression Techniques on Task-Specific Performance of Large Language Models

Bishwash Khanal

Jeffery M. Capone

267

17 Sep 2024

KVPruner: Structural Pruning for Faster and Memory-Efficient Large Language ModelsIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2024

178

17 Sep 2024

S-STE: Continuous Pruning Function for Efficient 2:4 Sparse Pre-trainingNeural Information Processing Systems (NeurIPS), 2024

Yuezhou Hu

Jun-Jie Zhu

Jianfei Chen

414

13 Sep 2024

STUN: Structured-Then-Unstructured Pruning for Scalable MoE PruningAnnual Meeting of the Association for Computational Linguistics (ACL), 2024

287

10 Sep 2024

Ferret: Federated Full-Parameter Tuning at Scale for Large Language Models

Yao Shu

Wenyang Hu

Szu Hui Ng

Bryan Kian Hsiang Low

Fei Richard Yu

FedML

456

10 Sep 2024

Achieving Peak Performance for Large Language Models: A Systematic ReviewIEEE Access (IEEE Access), 2024

Z. R. K. Rostam

Sándor Szénási

Gábor Kertész

321

07 Sep 2024

CHESS: Optimizing LLM Inference via Channel-Wise Thresholding and Selective SparsificationConference on Empirical Methods in Natural Language Processing (EMNLP), 2024

Chun Jason Xue

02 Sep 2024

OnlySportsLM: Optimizing Sports-Domain Language Models with SOTA Performance under Billion Parameters

200

30 Aug 2024

Investigating Neuron Ablation in Attention Heads: The Case for Peak Activation Centering

Nicholas Pochinkov

Ben Pasero

Skylar Shibayama

185

30 Aug 2024

The Iterative Optimal Brain Surgeon: Faster Sparse Recovery by Leveraging Second-Order InformationNeural Information Processing Systems (NeurIPS), 2024

Diyuan Wu

Ionut-Vlad Modoranu

M. Safaryan

Denis Kuznedelev

Dan Alistarh

331

30 Aug 2024

GIFT-SW: Gaussian noise Injected Fine-Tuning of Salient Weights for LLMs

Evgeny Burnaev

261

27 Aug 2024

MPruner: Optimizing Neural Network Size with CKA-Based Mutual Information Pruning

353

24 Aug 2024

A Tighter Complexity Analysis of SparseGPT

313

22 Aug 2024

MARLIN: Mixed-Precision Auto-Regressive Parallel Inference on Large Language ModelsACM SIGPLAN Symposium on Principles & Practice of Parallel Programming (PPoPP), 2024

230

21 Aug 2024

First Activations Matter: Training-Free Methods for Dynamic Activation in Large Language Models

Yujie Wang

250

21 Aug 2024

Enhancing One-shot Pruned Pre-trained Language Models through Sparse-Dense-Sparse MechanismInternational Conference on Computational Linguistics (COLING), 2024

Guanchen Li

Xiandong Zhao

Lian Liu

Zeping Li

Dong Li

172

20 Aug 2024

LLM-Barber: Block-Aware Rebuilder for Sparsity Mask in One-Shot for Large Language Models

179

20 Aug 2024

Fine-Tuning and Deploying Large Language Models Over Edges: Issues and Approaches

259

20 Aug 2024

MoDeGPT: Modular Decomposition for Large Language Model CompressionInternational Conference on Learning Representations (ICLR), 2024

757

19 Aug 2024

Antidote: Post-fine-tuning Safety Alignment for Large Language Models against Harmful Fine-tuning

596

18 Aug 2024

Turning Trash into Treasure: Accelerating Inference of Large Language Models with Token RecyclingAnnual Meeting of the Association for Computational Linguistics (ACL), 2024

454

16 Aug 2024

FactorLLM: Factorizing Knowledge via Mixture of Experts for Large Language Models

Zhongyu Zhao

Menghang Dong

Rongyu Zhang

Wenzhao Zheng

Yunpeng Zhang

Huanrui Yang

Dalong Du

Kurt Keutzer

Shanghang Zhang

327

15 Aug 2024

KOALA: Enhancing Speculative Decoding for LLM via Multi-Layer Draft Heads with Adversarial LearningInternational Conference on Computer Supported Cooperative Work in Design (CSCWD), 2024

Kaiqi Zhang

Jing Zhao

Rui Chen

312

15 Aug 2024

Post-Training Sparse Attention with Double Sparsity

Shuo Yang

Ying Sheng

Joseph E. Gonzalez

Ion Stoica

Lianmin Zheng

296

11 Aug 2024

LLMServingSim: A HW/SW Co-Simulation Infrastructure for LLM Inference Serving at ScaleIEEE International Symposium on Workload Characterization (IISWC), 2024

378

10 Aug 2024

A Convex-optimization-based Layer-wise Post-training Pruner for Large Language Models

200

07 Aug 2024

Compress and Compare: Interactively Evaluating Efficiency and Behavior Across ML Model Compression ExperimentsIEEE Transactions on Visualization and Computer Graphics (TVCG), 2024

228

06 Aug 2024

Inference Optimizations for Large Language Models: Effects, Challenges, and Practical Considerations

Leo Donisch

Sigurd Schacht

Carsten Lanquillon

298

06 Aug 2024

STBLLM: Breaking the 1-Bit Barrier with Structured Binary LLMsInternational Conference on Learning Representations (ICLR), 2024

...

Wei Xue

Wenhan Luo

Qi-fei Liu

Yi-Ting Guo

Xiaowen Chu

202

03 Aug 2024

Finch: Prompt-guided Key-Value Cache CompressionTransactions of the Association for Computational Linguistics (TACL), 2024

Giulio Corallo

Paolo Papotti

425

31 Jul 2024