v1v2v3 (latest)

SparseGPT: Massive Language Models Can Be Accurately Pruned in One-Shot

International Conference on Machine Learning (ICML), 2023

2 January 2023

Elias Frantar

Dan Alistarh

VLM

ArXiv (abs)PDF HTML HuggingFace (3 upvotes)Github (799★)

Papers citing "SparseGPT: Massive Language Models Can Be Accurately Pruned in One-Shot"

50 / 665 papers shown

Align-KD: Distilling Cross-Modal Alignment Knowledge for Mobile Vision-Language Model

310

02 Dec 2024

Efficient LLM Inference using Dynamic Input Pruning and Cache-Aware Masking

579

02 Dec 2024

Is Oracle Pruning the True Oracle?

351

28 Nov 2024

Preserving Deep Representations In One-Shot Pruning: A Hessian-Free Second-Order Optimization FrameworkInternational Conference on Learning Representations (ICLR), 2024

Ryan Lucas

Rahul Mazumder

313

27 Nov 2024

Mixture of Cache-Conditional Experts for Efficient Mobile Device Inference

408

27 Nov 2024

Reassessing Layer Pruning in LLMs: New Insights and Methods

340

23 Nov 2024

Layer Pruning with Consensus: A Triple-Win SolutionIEEE Access (IEEE Access), 2024

Leandro Giusti Mugnaini

Carolina Tavares Duarte

Anna Helena Reali Costa

Artur Jordao

297

21 Nov 2024

AutoMixQ: Self-Adjusting Quantization for High Performance Memory-Efficient Fine-Tuning

214

21 Nov 2024

DRPruning: Efficient Large Language Model Pruning through Distributionally Robust OptimizationAnnual Meeting of the Association for Computational Linguistics (ACL), 2024

573

21 Nov 2024

From Pruning to Grafting: Dynamic Knowledge Redistribution via Learnable Layer Fusion

530

21 Nov 2024

SAM Decoding: Speculative Decoding via Suffix AutomatonAnnual Meeting of the Association for Computational Linguistics (ACL), 2024

475

16 Nov 2024

AmoebaLLM: Constructing Any-Shape Large Language Models for Efficient and Instant DeploymentNeural Information Processing Systems (NeurIPS), 2024

285

15 Nov 2024

^2

Law: Scaling Law for Post-Training After Model Pruning

232

15 Nov 2024

Reducing Reasoning Costs: The Path of Optimization for Chain of Thought via Sparse Attention Mechanism

Libo Wang

LRM AI4CE

548

14 Nov 2024

Zeroth-Order Adaptive Neuron Alignment Based Pruning without Re-Training

Elia Cunegatti

Leonardo Lucio Custode

Giovanni Iacca

641

11 Nov 2024

CE-CoLLM: Efficient and Adaptive Large Language Models Through Cloud-Edge Collaboration

Hongpeng Jin

Yanzhao Wu

550

05 Nov 2024

DeeR-VLA: Dynamic Inference of Multimodal Large Language Models for Efficient Robot ExecutionNeural Information Processing Systems (NeurIPS), 2024

299

04 Nov 2024

Sparsing Law: Towards Large Language Models with Greater Activation Sparsity

589

04 Nov 2024

Fast and Memory-Efficient Video Diffusion Using Streamlined InferenceNeural Information Processing Systems (NeurIPS), 2024

205

02 Nov 2024

NEO: Saving GPU Memory Crisis with CPU Offloading for Online LLM Inference

224

02 Nov 2024

MoE-I

^2

: Compressing Mixture of Experts Models through Inter-Expert Pruning and Intra-Expert Low-Rank DecompositionConference on Empirical Methods in Natural Language Processing (EMNLP), 2024

418

01 Nov 2024

The Impact of Inference Acceleration on Bias of LLMsNorth American Chapter of the Association for Computational Linguistics (NAACL), 2024

356

29 Oct 2024

ProMoE: Fast MoE-based LLM Serving using Proactive Caching

488

29 Oct 2024

BLAST: Block-Level Adaptive Structured Matrices for Efficient Deep Neural Network InferenceNeural Information Processing Systems (NeurIPS), 2024

285

28 Oct 2024

LLMCBench: Benchmarking Large Language Model Compression for Efficient DeploymentNeural Information Processing Systems (NeurIPS), 2024

279

28 Oct 2024

EoRA: Fine-tuning-free Compensation for Compressed LLM with Eigenspace Low-Rank Approximation

...

554

28 Oct 2024

LEGO: Language Model Building Blocks

163

23 Oct 2024

Multi-Draft Speculative Sampling: Canonical Decomposition and Theoretical LimitsInternational Conference on Learning Representations (ICLR), 2024

Ashish Khisti

MohammadReza Ebrahimi

333

23 Oct 2024

Beware of Calibration Data for Pruning Large Language ModelsInternational Conference on Learning Representations (ICLR), 2024

322

23 Oct 2024

Self-calibration for Language Model Quantization and PruningNorth American Chapter of the Association for Computational Linguistics (NAACL), 2024

Miles Williams

G. Chrysostomou

Nikolaos Aletras

1.0K

22 Oct 2024

Pruning Foundation Models for High Accuracy without RetrainingConference on Empirical Methods in Natural Language Processing (EMNLP), 2024

Yanzhi Wang

216

21 Oct 2024

SDP4Bit: Toward 4-bit Communication Quantization in Sharded Data Parallelism for LLM TrainingNeural Information Processing Systems (NeurIPS), 2024

...

288

20 Oct 2024

EvoPress: Accurate Dynamic Model Compression via Evolutionary Search

416

18 Oct 2024

GDeR: Safeguarding Efficiency, Balancing, and Robustness via Prototypical Graph PruningNeural Information Processing Systems (NeurIPS), 2024

Guibin Zhang

Haonan Dong

Yuchen Zhang

Zhixun Li

Yuxuan Liang

Kun Wang

285

17 Oct 2024

Harnessing Your DRAM and SSD for Sustainable and Accessible LLM Inference with Mixed-Precision and Multi-level Caching

Yanyong Zhang

304

17 Oct 2024

On the Role of Attention Heads in Large Language Model SafetyInternational Conference on Learning Representations (ICLR), 2024

Kun Wang

Yang Liu

Cunchun Li

Yongbin Li

489

17 Oct 2024

DAQ: Density-Aware Post-Training Weight-Only Quantization For LLMs

Yingsong Luo

Ling Chen

247

16 Oct 2024

FiRST: Finetuning Router-Selective Transformers for Input-Adaptive Latency Reduction

344

16 Oct 2024

Channel-Wise Mixed-Precision Quantization for Large Language Models

Zihan Chen

Bike Xie

Jundong Li

Cong Shen

503

16 Oct 2024

MoE-Pruner: Pruning Mixture-of-Experts Large Language Model using the Hints from Its Router

Zhi Zhang

Yanzhi Wang

231

15 Oct 2024

DISP-LLM: Dimension-Independent Structural Pruning for Large Language ModelsNeural Information Processing Systems (NeurIPS), 2024

243

15 Oct 2024

LLM2Swarm: Robot Swarms that Responsively Reason, Plan, and Collaborate through LLMs

288

15 Oct 2024

SLaNC: Static LayerNorm Calibration

Mahsa Salmani

Nikita Trukhanov

I. Soloveychik

244

14 Oct 2024

AlphaPruning: Using Heavy-Tailed Self Regularization Theory for Improved Layer-wise Pruning of Large Language ModelsNeural Information Processing Systems (NeurIPS), 2024

Haiquan Lu

Yefan Zhou

Shiwei Liu

Zhangyang Wang

Michael W. Mahoney

Yaoqing Yang

146

14 Oct 2024

HSR-Enhanced Sparse Attention Acceleration

815

14 Oct 2024

Skipping Computations in Multimodal LLMs

Mustafa Shukor

Matthieu Cord

239

12 Oct 2024

DeltaDQ: Ultra-High Delta Compression for Fine-Tuned LLMs via Group-wise Dropout and Separate Quantization

Zelan Yang

151

11 Oct 2024

QEFT: Quantization for Efficient Fine-Tuning of LLMsConference on Empirical Methods in Natural Language Processing (EMNLP), 2024

Changhun Lee

Jun-gyu Jin

Eunhyeok Park

214

11 Oct 2024

Is C4 Dataset Optimal for Pruning? An Investigation of Calibration Data for LLM PruningConference on Empirical Methods in Natural Language Processing (EMNLP), 2024

Abhinav Bandari

L. Yin

Cheng-Yu Hsieh

Ajay Kumar Jaiswal

Tianlong Chen

Li Shen

Ranjay Krishna

Shiwei Liu

193

09 Oct 2024

Chip-Tuning: Classify Before Language Models Say

Hui Wang

216

09 Oct 2024