Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2301.00774
Cited By
v1
v2
v3 (latest)
SparseGPT: Massive Language Models Can Be Accurately Pruned in One-Shot
International Conference on Machine Learning (ICML), 2023
2 January 2023
Elias Frantar
Dan Alistarh
VLM
Re-assign community
ArXiv (abs)
PDF
HTML
HuggingFace (3 upvotes)
Github (799★)
Papers citing
"SparseGPT: Massive Language Models Can Be Accurately Pruned in One-Shot"
50 / 665 papers shown
Compressing Large Language Models with Automated Sub-Network Search
R. Sukthanker
B. Staffler
Katharina Eggensperger
Aaron Klein
LRM
321
0
0
09 Oct 2024
Functional-level Uncertainty Quantification for Calibrated Fine-tuning on LLMs
Ruijia Niu
D. Wu
Rose Yu
Yi-An Ma
513
2
0
09 Oct 2024
A Survey: Collaborative Hardware and Software Design in the Era of Large Language Models
IEEE Circuits and Systems Magazine (IEEE CSM), 2024
Cong Guo
Feng Cheng
Zhixu Du
James Kiessling
Jonathan Ku
...
Qilin Zheng
Guanglei Zhou
Hai
Li-Wei Li
Yiran Chen
226
19
0
08 Oct 2024
Treat Visual Tokens as Text? But Your MLLM Only Needs Fewer Efforts to See
Phu Pham
Phu Pham
Kun Wan
Yu-Jhe Li
Zeliang Zhang
Daniel Miranda
Ajinkya Kale
Ajinkya Kale
Chenliang Xu
253
1
0
08 Oct 2024
Mixture Compressor for Mixture-of-Experts LLMs Gains More
International Conference on Learning Representations (ICLR), 2024
Wei Huang
Yue Liao
Jianhui Liu
Ruifei He
Haoru Tan
Shiming Zhang
Hongsheng Li
Si Liu
Xiaojuan Qi
MoE
298
23
0
08 Oct 2024
ESPACE: Dimensionality Reduction of Activations for Model Compression
Neural Information Processing Systems (NeurIPS), 2024
Charbel Sakr
Brucek Khailany
260
14
0
07 Oct 2024
Large Language Model Inference Acceleration: A Comprehensive Hardware Perspective
Jinhao Li
Jiaming Xu
Shan Huang
Yonghua Chen
Wen Li
...
Jiayi Pan
Li Ding
Hao Zhou
Yu Wang
Guohao Dai
633
49
0
06 Oct 2024
ARB-LLM: Alternating Refined Binarizations for Large Language Models
International Conference on Learning Representations (ICLR), 2024
Zhiteng Li
Xinyu Yan
Tianao Zhang
Haotong Qin
Dong Xie
Jiang Tian
Peng Wang
Linghe Kong
Yulun Zhang
Yunbo Wang
MQ
323
18
0
04 Oct 2024
Basis Sharing: Cross-Layer Parameter Sharing for Large Language Model Compression
International Conference on Learning Representations (ICLR), 2024
Jingcun Wang
Yu-Guang Chen
Ing-Chao Lin
Bing Li
Grace Li Zhang
210
19
0
02 Oct 2024
Getting Free Bits Back from Rotational Symmetries in LLMs
Wenlin Chen
Gergely Flamich
José Miguel Hernández-Lobato
MQ
202
0
0
02 Oct 2024
Exploring Gen-AI applications in building research and industry: A review
Building Simulation (BS), 2024
Hanlong Wan
Jian Zhang
Yan Chen
Weili Xu
Fan Feng
AI4CE
319
9
0
01 Oct 2024
Aggressive Post-Training Compression on Extremely Large Language Models
Zining Zhang
Yao Chen
Bingsheng He
Zhenjie Zhang
82
0
0
30 Sep 2024
EEG Emotion Copilot: Optimizing Lightweight LLMs for Emotional EEG Interpretation with Assisted Medical Record Generation
Neural Networks (NN), 2024
Hongyu Chen
Weiming Zeng
Chong Chen
Luhui Cai
Haiwei Yang
...
Wei Zhang
Yuchen Ren
Hongjie Yan
W. Siok
Nizhuan Wang
339
0
0
30 Sep 2024
Efficient Arbitrary Precision Acceleration for Large Language Models on GPU Tensor Cores
Asia and South Pacific Design Automation Conference (ASP-DAC), 2024
Shaobo Ma
Chao Fang
Haikuo Shao
Zhongfeng Wang
318
5
0
26 Sep 2024
MaskLLM: Learnable Semi-Structured Sparsity for Large Language Models
Neural Information Processing Systems (NeurIPS), 2024
Gongfan Fang
Hongxu Yin
Saurav Muralidharan
Greg Heinrich
Jeff Pool
Jan Kautz
Pavlo Molchanov
Xinchao Wang
174
35
0
26 Sep 2024
Pruning Multilingual Large Language Models for Multilingual Inference
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2024
Hwichan Kim
Jun Suzuki
Tosho Hirasawa
Mamoru Komachi
418
1
0
25 Sep 2024
Demystifying Issues, Causes and Solutions in LLM Open-Source Projects
Journal of Systems and Software (JSS), 2024
Yangxiao Cai
Peng Liang
Yifei Wang
Zengyang Li
Mojtaba Shahin
299
8
0
25 Sep 2024
Enhancing Aspect-based Sentiment Analysis in Tourism Using Large Language Models and Positional Information
Chun Xu
Mengmeng Wang
Yan Ren
Shaolin Zhu
219
10
0
23 Sep 2024
CFSP: An Efficient Structured Pruning Framework for LLMs with Coarse-to-Fine Activation Information
International Conference on Computational Linguistics (COLING), 2024
Yuxin Wang
Minghua Ma
Zekun Wang
Jingchang Chen
Huiming Fan
Liping Shan
Qing Yang
Dongliang Xu
Ming Liu
Bing Qin
178
6
0
20 Sep 2024
OATS: Outlier-Aware Pruning Through Sparse and Low Rank Decomposition
International Conference on Learning Representations (ICLR), 2024
Stephen Zhang
Vardan Papyan
VLM
556
16
0
20 Sep 2024
Evaluating the Impact of Compression Techniques on Task-Specific Performance of Large Language Models
Bishwash Khanal
Jeffery M. Capone
267
2
0
17 Sep 2024
KVPruner: Structural Pruning for Faster and Memory-Efficient Large Language Models
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2024
Bo Lv
Quan Zhou
Xuanang Ding
Yan Wang
Zeming Ma
VLM
178
4
0
17 Sep 2024
S-STE: Continuous Pruning Function for Efficient 2:4 Sparse Pre-training
Neural Information Processing Systems (NeurIPS), 2024
Yuezhou Hu
Jun-Jie Zhu
Jianfei Chen
414
5
0
13 Sep 2024
STUN: Structured-Then-Unstructured Pruning for Scalable MoE Pruning
Annual Meeting of the Association for Computational Linguistics (ACL), 2024
Jaeseong Lee
Seung-won Hwang
Aurick Qiao
Daniel F Campos
Z. Yao
Yuxiong He
287
10
0
10 Sep 2024
Ferret: Federated Full-Parameter Tuning at Scale for Large Language Models
Yao Shu
Wenyang Hu
Szu Hui Ng
Bryan Kian Hsiang Low
Fei Richard Yu
FedML
456
3
0
10 Sep 2024
Achieving Peak Performance for Large Language Models: A Systematic Review
IEEE Access (IEEE Access), 2024
Z. R. K. Rostam
Sándor Szénási
Gábor Kertész
321
18
0
07 Sep 2024
CHESS: Optimizing LLM Inference via Channel-Wise Thresholding and Selective Sparsification
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2024
Junhui He
Shangyu Wu
Weidong Wen
Chun Jason Xue
Qingan Li
96
8
0
02 Sep 2024
OnlySportsLM: Optimizing Sports-Domain Language Models with SOTA Performance under Billion Parameters
Zexin Chen
Chengxi Li
Xiangyu Xie
Parijat Dube
ALM
200
4
0
30 Aug 2024
Investigating Neuron Ablation in Attention Heads: The Case for Peak Activation Centering
Nicholas Pochinkov
Ben Pasero
Skylar Shibayama
185
6
0
30 Aug 2024
The Iterative Optimal Brain Surgeon: Faster Sparse Recovery by Leveraging Second-Order Information
Neural Information Processing Systems (NeurIPS), 2024
Diyuan Wu
Ionut-Vlad Modoranu
M. Safaryan
Denis Kuznedelev
Dan Alistarh
331
6
0
30 Aug 2024
GIFT-SW: Gaussian noise Injected Fine-Tuning of Salient Weights for LLMs
Maxim Zhelnin
Viktor Moskvoretskii
Egor Shvetsov
Egor Venediktov
Mariya Krylova
Aleksandr Zuev
Evgeny Burnaev
261
6
0
27 Aug 2024
MPruner: Optimizing Neural Network Size with CKA-Based Mutual Information Pruning
Seungbeom Hu
ChanJun Park
Andrew Ferraiuolo
Sang-Ki Ko
Jinwoo Kim
Haein Song
Jieung Kim
353
2
0
24 Aug 2024
A Tighter Complexity Analysis of SparseGPT
Xiaoyu Li
Yingyu Liang
Zhenmei Shi
Zhao Song
313
25
0
22 Aug 2024
MARLIN: Mixed-Precision Auto-Regressive Parallel Inference on Large Language Models
ACM SIGPLAN Symposium on Principles & Practice of Parallel Programming (PPoPP), 2024
Elias Frantar
Roberto L. Castro
Jiale Chen
Torsten Hoefler
Dan Alistarh
MQ
230
28
0
21 Aug 2024
First Activations Matter: Training-Free Methods for Dynamic Activation in Large Language Models
Chi Ma
Mincong Huang
Ying Zhang
Chao Wang
Yujie Wang
Lei Yu
Chuan Liu
Wei Lin
AI4CE
LLMSV
250
3
0
21 Aug 2024
Enhancing One-shot Pruned Pre-trained Language Models through Sparse-Dense-Sparse Mechanism
International Conference on Computational Linguistics (COLING), 2024
Guanchen Li
Xiandong Zhao
Lian Liu
Zeping Li
Dong Li
Lu Tian
Jie He
Ashish Sirasao
E. Barsoum
VLM
172
2
0
20 Aug 2024
LLM-Barber: Block-Aware Rebuilder for Sparsity Mask in One-Shot for Large Language Models
Yupeng Su
Ziyi Guan
Xiaoqun Liu
Tianlai Jin
Dongkuan Wu
Zhengfei Chen
G. Chesi
Ngai Wong
Hao Yu
179
2
0
20 Aug 2024
Fine-Tuning and Deploying Large Language Models Over Edges: Issues and Approaches
Yanjie Dong
Xiaoyi Fan
Fangxin Wang
Chengming Li
Victor C. M. Leung
Xiping Hu
259
11
0
20 Aug 2024
MoDeGPT: Modular Decomposition for Large Language Model Compression
International Conference on Learning Representations (ICLR), 2024
Chi-Heng Lin
Shangqian Gao
James Seale Smith
Abhishek Patel
Shikhar Tuli
Yilin Shen
Hongxia Jin
Yen-Chang Hsu
757
29
0
19 Aug 2024
Antidote: Post-fine-tuning Safety Alignment for Large Language Models against Harmful Fine-tuning
Tiansheng Huang
Gautam Bhattacharya
Pratik Joshi
Josh Kimball
Ling Liu
AAML
MoMe
596
48
0
18 Aug 2024
Turning Trash into Treasure: Accelerating Inference of Large Language Models with Token Recycling
Annual Meeting of the Association for Computational Linguistics (ACL), 2024
Xianzhen Luo
Yixuan Wang
Qingfu Zhu
Zhiming Zhang
Xuanyu Zhang
Qing Yang
Dongliang Xu
454
24
0
16 Aug 2024
FactorLLM: Factorizing Knowledge via Mixture of Experts for Large Language Models
Zhongyu Zhao
Menghang Dong
Rongyu Zhang
Wenzhao Zheng
Yunpeng Zhang
Huanrui Yang
Dalong Du
Kurt Keutzer
Shanghang Zhang
327
1
0
15 Aug 2024
KOALA: Enhancing Speculative Decoding for LLM via Multi-Layer Draft Heads with Adversarial Learning
International Conference on Computer Supported Cooperative Work in Design (CSCWD), 2024
Kaiqi Zhang
Jing Zhao
Rui Chen
312
5
0
15 Aug 2024
Post-Training Sparse Attention with Double Sparsity
Shuo Yang
Ying Sheng
Joseph E. Gonzalez
Ion Stoica
Lianmin Zheng
296
25
0
11 Aug 2024
LLMServingSim: A HW/SW Co-Simulation Infrastructure for LLM Inference Serving at Scale
IEEE International Symposium on Workload Characterization (IISWC), 2024
Jaehong Cho
Minsu Kim
Hyunmin Choi
Guseul Heo
Jongse Park
378
24
0
10 Aug 2024
A Convex-optimization-based Layer-wise Post-training Pruner for Large Language Models
Pengxiang Zhao
Hanyu Hu
Ping Li
Yi Zheng
Zhefeng Wang
Xiaoming Yuan
200
2
0
07 Aug 2024
Compress and Compare: Interactively Evaluating Efficiency and Behavior Across ML Model Compression Experiments
IEEE Transactions on Visualization and Computer Graphics (TVCG), 2024
Angie Boggust
Venkatesh Sivaraman
Yannick Assogba
Donghao Ren
Dominik Moritz
Fred Hohman
VLM
228
10
0
06 Aug 2024
Inference Optimizations for Large Language Models: Effects, Challenges, and Practical Considerations
Leo Donisch
Sigurd Schacht
Carsten Lanquillon
298
3
0
06 Aug 2024
STBLLM: Breaking the 1-Bit Barrier with Structured Binary LLMs
International Conference on Learning Representations (ICLR), 2024
Peijie Dong
Lujun Li
Dayou Du
Yuhan Chen
Zhenheng Tang
...
Wei Xue
Wenhan Luo
Qi-fei Liu
Yi-Ting Guo
Xiaowen Chu
MQ
202
31
0
03 Aug 2024
Finch: Prompt-guided Key-Value Cache Compression
Transactions of the Association for Computational Linguistics (TACL), 2024
Giulio Corallo
Paolo Papotti
425
4
0
31 Jul 2024
Previous
1
2
3
...
7
8
9
...
12
13
14
Next
Page 8 of 14
Page
of 14
Go