Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2301.00774
Cited By
v1
v2
v3 (latest)
SparseGPT: Massive Language Models Can Be Accurately Pruned in One-Shot
International Conference on Machine Learning (ICML), 2023
2 January 2023
Elias Frantar
Dan Alistarh
VLM
Re-assign community
ArXiv (abs)
PDF
HTML
HuggingFace (3 upvotes)
Github (799★)
Papers citing
"SparseGPT: Massive Language Models Can Be Accurately Pruned in One-Shot"
50 / 665 papers shown
Collaborative Learning of On-Device Small Model and Cloud-Based Large Model: Advances and Future Directions
Chaoyue Niu
Yucheng Ding
Junhui Lu
Zhengxiang Huang
Hang Zeng
Yutong Dai
Xuezhen Tu
Chengfei Lv
Fan Wu
Guihai Chen
331
2
0
17 Apr 2025
70% Size, 100% Accuracy: Lossless LLM Compression for Efficient GPU Inference via Dynamic-Length Float (DFloat11)
Tianyi Zhang
Yang Sui
Shaochen Zhong
Vipin Chaudhary
Helen Zhou
Xia Hu
Anshumali Shrivastava
MQ
280
10
0
15 Apr 2025
Understanding and Optimizing Multi-Stage AI Inference Pipelines
Abhimanyu Bambhaniya
Hanjiang Wu
Suvinay Subramanian
Sudarshan Srinivasan
Souvik Kundu
Amir Yazdanbakhsh
Suvinay Subramanian
Madhu Kumar
Tushar Krishna
986
0
0
14 Apr 2025
TAMP: Token-Adaptive Layerwise Pruning in Multimodal Large Language Models
Annual Meeting of the Association for Computational Linguistics (ACL), 2025
Jaewoo Lee
Keyang Xuan
Chanakya Ekbote
Sandeep Polisetty
Yi R. Fung
Paul Pu Liang
VLM
368
3
0
14 Apr 2025
HELIOS: Adaptive Model And Early-Exit Selection for Efficient LLM Inference Serving
Avinash Kumar
Shashank Nag
Jason Clemons
L. John
Poulami Das
460
1
0
14 Apr 2025
Scaling Up On-Device LLMs via Active-Weight Swapping Between DRAM and Flash
Fucheng Jia
Zewen Wu
Shiqi Jiang
Huiqiang Jiang
Qianxi Zhang
Yue Yang
Yunxin Liu
Ju Ren
Deyu Zhang
Ting Cao
743
2
0
11 Apr 2025
SpecEE: Accelerating Large Language Model Inference with Speculative Early Exiting
International Symposium on Computer Architecture (ISCA), 2025
Jiaming Xu
Jiayi Pan
Yongkang Zhou
Siming Chen
Jiajian Li
Yaoxiu Lian
Junyi Wu
Guohao Dai
LRM
298
8
0
11 Apr 2025
SD
2
^2
2
: Self-Distilled Sparse Drafters
Mike Lasby
Nish Sinnadurai
Valavan Manohararajah
Sean Lie
Yani Andrew Ioannou
Vithursan Thangarasa
785
1
0
10 Apr 2025
Quantization Hurts Reasoning? An Empirical Study on Quantized Reasoning Models
Ruikang Liu
Yuxuan Sun
Manyi Zhang
Haoli Bai
Xianzhi Yu
Tiezheng Yu
C. Yuan
Lu Hou
MQ
LRM
426
27
0
07 Apr 2025
AccLLM: Accelerating Long-Context LLM Inference Via Algorithm-Hardware Co-Design
Yanbiao Liang
Huihong Shi
Haikuo Shao
Zhongfeng Wang
247
3
0
07 Apr 2025
Saliency-driven Dynamic Token Pruning for Large Language Models
Yao Tao
Yehui Tang
Yun Wang
Mingjian Zhu
Hailin Hu
Yunhe Wang
449
4
0
06 Apr 2025
Thanos: A Block-wise Pruning Algorithm for Efficient Large Language Model Compression
Ivan Ilin
Peter Richtárik
163
5
0
06 Apr 2025
Compression Laws for Large Language Models
Ayan Sengupta
Siddhant Chaudhary
Tanmoy Chakraborty
205
1
0
06 Apr 2025
Towards Understanding and Improving Refusal in Compressed Models via Mechanistic Interpretability
Vishnu Kabir Chhabra
Mohammad Mahdi Khalili
AI4CE
242
0
0
05 Apr 2025
Entropy-Based Block Pruning for Efficient Large Language Models
Liangwei Yang
Yuhui Xu
Juntao Tan
Doyen Sahoo
Siyang Song
Caiming Xiong
Han Wang
Shelby Heinecke
AAML
210
0
0
04 Apr 2025
When Reasoning Meets Compression: Understanding the Effects of LLMs Compression on Large Reasoning Models
Nan Zhang
Eugene Kwek
Yusen Zhang
Ngoc-Hieu Nguyen
Prasenjit Mitra
Rui Zhang
MQ
LRM
527
8
0
02 Apr 2025
SQuat: Subspace-orthogonal KV Cache Quantization
Hao Wang
Ligong Han
Kai Xu
Akash Srivastava
MQ
369
2
0
31 Mar 2025
Model Hemorrhage and the Robustness Limits of Large Language Models
Ziyang Ma
Hui Yuan
Guang Dai
Gui-Song Xia
Bo Du
Liangpei Zhang
Dacheng Tao
317
1
0
31 Mar 2025
Task-Aware Parameter-Efficient Fine-Tuning of Large Pre-Trained Models at the Edge
Senkang Hu
Yanan Ma
Yihang Tao
Zhengru Fang
Zihan Fang
Yiqin Deng
Sam Kwong
Yuguang Fang
259
2
0
29 Mar 2025
Breach in the Shield: Unveiling the Vulnerabilities of Large Language Models
Runpeng Dai
Run Yang
Fan Zhou
Hongtu Zhu
289
0
0
28 Mar 2025
STADE: Standard Deviation as a Pruning Metric
Diego Coello de Portugal Mecke
Haya Alyoussef
Ilia Koloiarov
Ilia Koloiarov
Lars Schmidt-Thieme
Lars Schmidt-Thieme
305
0
0
28 Mar 2025
As easy as PIE: understanding when pruning causes language models to disagree
North American Chapter of the Association for Computational Linguistics (NAACL), 2025
Pietro Tropeano
Maria Maistro
Tuukka Ruotsalo
Christina Lioma
246
0
0
27 Mar 2025
Mobile-VideoGPT: Fast and Accurate Video Understanding Language Model
Abdelrahman M. Shaker
Muhammad Maaz
Chenhui Gou
Hamid Rezatofighi
Salman Khan
Fahad Shahbaz Khan
916
3
0
27 Mar 2025
Maximum Redundancy Pruning: A Principle-Driven Layerwise Sparsity Allocation for LLMs
Chang Gao
Kang Zhao
Runqi Wang
Jianfei Chen
Liping Jing
271
1
0
24 Mar 2025
Energy-Aware LLMs: A step towards sustainable AI for downstream applications
Nguyen Phuc Tran
Brigitte Jaumard
Oscar Delgado
202
1
0
22 Mar 2025
Large Language Model Compression via the Nested Activation-Aware Decomposition
Jun Lu
Tianyi Xu
Bill Ding
David Li
Yu Kang
236
1
0
21 Mar 2025
Accelerating Transformer Inference and Training with 2:4 Activation Sparsity
Daniel Haziza
Timothy Chou
Dhruv Choudhary
Luca Wehrstedt
Francisco Massa
Jiecao Yu
Geonhwa Jeong
Supriya Rao
Patrick Labatut
Jesse Cai
371
6
0
20 Mar 2025
EfficientLLaVA:Generalizable Auto-Pruning for Large Vision-language Models
Computer Vision and Pattern Recognition (CVPR), 2025
Yinan Liang
Xiping Hu
Xiuwei Xu
Jie Zhou
Jiwen Lu
VLM
LRM
250
5
0
19 Mar 2025
Theoretical Foundation of Flow-Based Time Series Generation: Provable Approximation, Generalization, and Efficiency
Jiangxuan Long
Zhao Song
Chiwun Yang
AI4TS
930
2
0
18 Mar 2025
Triad: Empowering LMM-based Anomaly Detection with Vision Expert-guided Visual Tokenizer and Manufacturing Process
Yuanze Li
Shihao Yuan
Haolin Wang
Qizhang Li
Ming-Yu Liu
Chen Xu
Guangming Shi
Wangmeng Zuo
298
4
0
17 Mar 2025
ML-SpecQD: Multi-Level Speculative Decoding with Quantized Drafts
E. Georganas
Dhiraj D. Kalamkar
Alexander Kozlov
A. Heinecke
MQ
923
4
0
17 Mar 2025
SVD-LLM V2: Optimizing Singular Value Truncation for Large Language Model Compression
North American Chapter of the Association for Computational Linguistics (NAACL), 2025
Xin Wang
Samiul Alam
Zhongwei Wan
Mengqi Li
Hao Fei
MQ
258
22
0
16 Mar 2025
PLM: Efficient Peripheral Language Models Hardware-Co-Designed for Ubiquitous Computing
Cheng Deng
Luoyang Sun
Jiwen Jiang
Yongcheng Zeng
Xinjian Wu
...
Haoyang Li
Lei Chen
Lionel M. Ni
Ning Yang
Jun Wang
950
2
0
15 Mar 2025
Changing Base Without Losing Pace: A GPU-Efficient Alternative to MatMul in DNNs
Nir Ailon
Akhiad Bercovich
Yahel Uffenheimer
Omri Weinstein
457
3
0
15 Mar 2025
Towards Extreme Pruning of LLMs with Plug-and-Play Mixed Sparsity
Chi Xu
Gefei Zhang
Yantong Zhu
Luca Benini
Guosheng Hu
Yawei Li
Zhihong Zhang
180
1
0
14 Mar 2025
Samoyeds: Accelerating MoE Models with Structured Sparsity Leveraging Sparse Tensor Cores
European Conference on Computer Systems (EuroSys), 2025
Chenpeng Wu
Qiqi Gu
Heng Shi
Jianguo Yao
Haibing Guan
MoE
210
5
0
13 Mar 2025
Federated Multimodal Learning with Dual Adapters and Selective Pruning for Communication and Computational Efficiency
IEEE/ACM International Symposium on Cluster, Cloud and Internet Computing (CCGrid), 2025
Duy Phuong Nguyen
J. P. Muñoz
Tanya Roosta
Ali Jannesari
FedML
280
2
0
10 Mar 2025
SEAP: Training-free Sparse Expert Activation Pruning Unlock the Brainpower of Large Language Models
Xun Liang
Ding Chen
Zhen Tao
Pengnian Qi
Chenyang Xi
Hanyu Wang
Jihao Zhao
Feiyu Xiong
Shichao Song
Zhiyu Li
VLM
250
0
0
10 Mar 2025
Can Small Language Models Reliably Resist Jailbreak Attacks? A Comprehensive Evaluation
Wenhui Zhang
Huiyu Xu
Peng Kuang
Zeqing He
Ziqi Zhu
Kui Ren
AAML
PILM
226
3
0
09 Mar 2025
IteRABRe: Iterative Recovery-Aided Block Reduction
Haryo Akbarianto Wibowo
Israfel Salazar
Hideki Tanaka
Masao Utiyama
Alham Fikri Aji
Mary Dabre
274
1
0
08 Mar 2025
Sample-aware Adaptive Structured Pruning for Large Language Models
AAAI Conference on Artificial Intelligence (AAAI), 2025
Jun Kong
Xinge Ma
Jin Wang
Xuejie Zhang
215
0
0
08 Mar 2025
Keeping Yourself is Important in Downstream Tuning Multimodal Large Language Model
Wenke Huang
Jian Liang
Xianda Guo
Yiyang Fang
Guancheng Wan
...
Bin Yang
He Li
Jiawei Shao
Mang Ye
Di Lin
OffRL
LRM
MLLM
KELM
VLM
350
8
0
06 Mar 2025
Wanda++: Pruning Large Language Models via Regional Gradients
Annual Meeting of the Association for Computational Linguistics (ACL), 2025
Yifan Yang
Kai Zhen
Bhavana Ganesh
Aram Galstyan
Goeric Huybrechts
...
S. Bodapati
Nathan Susanj
Zheng Zhang
Jack FitzGerald
Abhishek Kumar
567
11
0
06 Mar 2025
How can representation dimension dominate structurally pruned LLMs?
Mingxue Xu
Lisa Alazraki
Danilo Mandic
264
0
0
06 Mar 2025
Universality of Layer-Level Entropy-Weighted Quantization Beyond Model Architecture and Size
Alireza Behtash
Marijan Fofonjka
Ethan Baird
Tyler Mauer
Hossein Moghimifam
David Stout
Joel Dennison
MQ
436
3
0
06 Mar 2025
Revisiting Large Language Model Pruning using Neuron Semantic Attribution
Yizhuo Ding
Xinwei Sun
Yanwei Fu
Guosheng Hu
177
3
0
03 Mar 2025
RSQ: Learning from Important Tokens Leads to Better Quantized LLMs
Yi-Lin Sung
Prateek Yadav
Jialu Li
Jaehong Yoon
Joey Tianyi Zhou
MQ
256
2
0
03 Mar 2025
Sparse Brains are Also Adaptive Brains: Cognitive-Load-Aware Dynamic Activation for LLMs
Yiheng Yang
Yujie Wang
Chi Ma
Lei Yu
Emmanuele Chersoni
Chu-Ren Huang
308
1
0
26 Feb 2025
CABS: Conflict-Aware and Balanced Sparsification for Enhancing Model Merging
Zongzhen Yang
Binhang Qi
Hailong Sun
Wenrui Long
Ruobing Zhao
Yantao Du
MoMe
273
4
0
26 Feb 2025
Compressing Language Models for Specialized Domains
Miles Williams
G. Chrysostomou
Vitor Jeronymo
Nikolaos Aletras
MQ
304
1
0
25 Feb 2025
Previous
1
2
3
4
5
6
...
12
13
14
Next