Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
All Papers
0 / 0 papers shown
Title
Home
Papers
2301.00774
Cited By
v1
v2
v3 (latest)
SparseGPT: Massive Language Models Can Be Accurately Pruned in One-Shot
International Conference on Machine Learning (ICML), 2023
2 January 2023
Elias Frantar
Dan Alistarh
VLM
Re-assign community
ArXiv (abs)
PDF
HTML
HuggingFace (3 upvotes)
Github (799★)
Papers citing
"SparseGPT: Massive Language Models Can Be Accurately Pruned in One-Shot"
50 / 664 papers shown
Title
ConSmax: Hardware-Friendly Alternative Softmax with Learnable Parameters
Shiwei Liu
Guanchen Tao
Yifei Zou
Derek Chow
Zichen Fan
Kauna Lei
Bangfei Pan
Dennis Sylvester
Gregory Kielian
Mehdi Saligane
231
12
0
31 Jan 2024
A Comprehensive Survey of Compression Algorithms for Language Models
Seungcheol Park
Jaehyeon Choi
Sojin Lee
U. Kang
MQ
259
20
0
27 Jan 2024
SliceGPT: Compress Large Language Models by Deleting Rows and Columns
International Conference on Learning Representations (ICLR), 2024
Saleh Ashkboos
Maximilian L. Croci
Marcelo Gennari do Nascimento
Torsten Hoefler
James Hensman
VLM
411
279
0
26 Jan 2024
APT: Adaptive Pruning and Tuning Pretrained Language Models for Efficient Training and Inference
International Conference on Machine Learning (ICML), 2024
Bowen Zhao
Hannaneh Hajishirzi
Qingqing Cao
300
26
0
22 Jan 2024
PHOENIX: Open-Source Language Adaption for Direct Preference Optimization
Matthias Uhlig
Sigurd Schacht
Sudarshan Kamath Barkur
ALM
129
1
0
19 Jan 2024
Salute the Classic: Revisiting Challenges of Machine Translation in the Age of Large Language Models
Transactions of the Association for Computational Linguistics (TACL), 2024
Jianhui Pang
Fanghua Ye
Longyue Wang
Dian Yu
Derek F. Wong
Shuming Shi
Zhaopeng Tu
ALM
228
23
0
16 Jan 2024
Transferring Core Knowledge via Learngenes
Fu Feng
Jing Wang
Xin Geng
201
9
0
16 Jan 2024
APAR: LLMs Can Do Auto-Parallel Auto-Regressive Decoding
Mingdao Liu
Aohan Zeng
Bowen Wang
Peng Zhang
Jie Tang
Yuxiao Dong
175
19
0
12 Jan 2024
RoSA: Accurate Parameter-Efficient Fine-Tuning via Robust Adaptation
International Conference on Machine Learning (ICML), 2024
Mahdi Nikdan
Soroush Tabesh
Elvir Crnčević
Dan Alistarh
433
45
0
09 Jan 2024
FFSplit: Split Feed-Forward Network For Optimizing Accuracy-Efficiency Trade-off in Language Model Inference
Zirui Liu
Qingquan Song
Q. Xiao
Sathiya Keerthi Selvaraj
Rahul Mazumder
Aman Gupta
Helen Zhou
159
7
0
08 Jan 2024
FlightLLM: Efficient Large Language Model Inference with a Complete Mapping Flow on FPGAs
Symposium on Field Programmable Gate Arrays (FPGA), 2024
Shulin Zeng
Jun Liu
Guohao Dai
Xinhao Yang
Tianyu Fu
...
Zehao Wang
Ruoyu Zhang
Kairui Wen
Xuefei Ning
Yu Wang
261
108
0
08 Jan 2024
IoT in the Era of Generative AI: Vision and Challenges
IEEE Internet Computing (IEEE Internet Comput.), 2024
Xin Wang
Zhongwei Wan
Arvin Hekmati
M. Zong
Samiul Alam
Mi Zhang
Bhaskar Krishnamachari
236
5
0
03 Jan 2024
Fast and Optimal Weight Update for Pruned Large Language Models
Vladimír Boza
169
9
0
01 Jan 2024
The LLM Surgeon
Tycho F. A. van der Ouderaa
Markus Nagel
M. V. Baalen
Yuki Markus Asano
Tijmen Blankevoort
268
24
0
28 Dec 2023
Fast Inference of Mixture-of-Experts Language Models with Offloading
Artyom Eliseev
Denis Mazur
MoE
267
60
0
28 Dec 2023
MobileVLM : A Fast, Strong and Open Vision Language Assistant for Mobile Devices
Xiangxiang Chu
Limeng Qiao
Xinyang Lin
Shuang Xu
Yang Yang
...
Fei Wei
Xinyu Zhang
Bo Zhang
Xiaolin Wei
Chunhua Shen
MLLM
268
68
0
28 Dec 2023
PERP: Rethinking the Prune-Retrain Paradigm in the Era of LLMs
Max Zimmer
Megi Andoni
Christoph Spiegel
Sebastian Pokutta
VLM
453
15
0
23 Dec 2023
Towards Efficient Generative Large Language Model Serving: A Survey from Algorithms to Systems
Xupeng Miao
Xupeng Miao
Zhihao Zhang
Xinhao Cheng
Hongyi Jin
Tianqi Chen
Zhihao Jia
344
118
0
23 Dec 2023
A Performance Evaluation of a Quantized Large Language Model on Various Smartphones
Tolga Çöplü
Marc Loedi
Arto Bendiken
Mykhailo Makohin
Joshua J. Bouw
Stephen Cobb
MQ
134
5
0
19 Dec 2023
Fluctuation-based Adaptive Structured Pruning for Large Language Models
Yongqi An
Xu Zhao
Tao Yu
Ming Tang
Jinqiao Wang
224
92
0
19 Dec 2023
PowerInfer: Fast Large Language Model Serving with a Consumer-grade GPU
Symposium on Operating Systems Principles (SOSP), 2023
Yixin Song
Zeyu Mi
Haotong Xie
Haibo Chen
BDL
336
209
0
16 Dec 2023
OTOv3: Automatic Architecture-Agnostic Neural Network Training and Compression from Structured Pruning to Erasing Operators
Tianyi Chen
Tianyu Ding
Zhihui Zhu
Zeyu Chen
HsiangTao Wu
Ilya Zharkov
Luming Liang
180
5
0
15 Dec 2023
Rethinking Compression: Reduced Order Modelling of Latent Features in Large Language Models
Arnav Chavan
Nahush Lele
Deepak Gupta
161
1
0
12 Dec 2023
Large Multimodal Model Compression via Efficient Pruning and Distillation at AntGroup
Xinjian Zhao
Yao-Min Zhao
Jiajia Liu
Jingdong Chen
Chenyi Zhuang
Jinjie Gu
Ruocheng Guo
Xiangyu Zhao
117
8
0
10 Dec 2023
ASVD: Activation-aware Singular Value Decomposition for Compressing Large Language Models
Zhihang Yuan
Yuzhang Shang
Yue Song
Dawei Yang
Qiang Wu
Yan Yan
Guangyu Sun
MQ
630
104
0
10 Dec 2023
Apparate: Rethinking Early Exits to Tame Latency-Throughput Tensions in ML Serving
Symposium on Operating Systems Principles (SOSP), 2023
Yinwei Dai
Rui Pan
Anand Iyer
Kai Li
Ravi Netravali
148
15
0
08 Dec 2023
An LLM Compiler for Parallel Function Calling
Sehoon Kim
Suhong Moon
Ryan Tabrizi
Nicholas Lee
Michael W. Mahoney
Kurt Keutzer
A. Gholami
LRM
317
109
0
07 Dec 2023
JUNO: Optimizing High-Dimensional Approximate Nearest Neighbour Search with Sparsity-Aware Algorithm and Ray-Tracing Core Mapping
International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), 2023
Zihan Liu
Wentao Ni
Jingwen Leng
Yu Feng
Cong Guo
Quan Chen
Chao Li
Minyi Guo
Yuhao Zhu
157
24
0
04 Dec 2023
Visual Prompting Upgrades Neural Network Sparsification: A Data-Model Perspective
AAAI Conference on Artificial Intelligence (AAAI), 2023
Can Jin
Tianjin Huang
Yihua Zhang
Mykola Pechenizkiy
Sijia Liu
Shiwei Liu
Tianlong Chen
VLM
416
30
0
03 Dec 2023
Nonparametric Variational Regularisation of Pretrained Transformers
Fabio Fehr
James Henderson
140
2
0
01 Dec 2023
Fast and Efficient 2-bit LLM Inference on GPU: 2/4/16-bit in a Weight Matrix with Asynchronous Dequantization
International Conference on Computer Aided Design (ICCAD), 2023
Jinhao Li
Jiaming Xu
Shiyao Li
Shan Huang
Jun Liu
Yaoxiu Lian
Guohao Dai
MQ
192
10
0
28 Nov 2023
HexGen: Generative Inference of Large Language Model over Heterogeneous Environment
Youhe Jiang
Ran Yan
Xiaozhe Yao
Yang Zhou
Beidi Chen
Binhang Yuan
SyDa
202
31
0
20 Nov 2023
A Speed Odyssey for Deployable Quantization of LLMs
Qingyuan Li
Ran Meng
Yiduo Li
Bo Zhang
Liang Li
Yifan Lu
Xiangxiang Chu
Yerui Sun
Yuchen Xie
MQ
194
10
0
16 Nov 2023
Investigating Hallucinations in Pruned Large Language Models for Abstractive Summarization
Transactions of the Association for Computational Linguistics (TACL), 2023
G. Chrysostomou
Zhixue Zhao
Miles Williams
Nikolaos Aletras
HILM
191
21
0
15 Nov 2023
REST: Retrieval-Based Speculative Decoding
North American Chapter of the Association for Computational Linguistics (NAACL), 2023
Zhenyu He
Zexuan Zhong
Tianle Cai
Jason D. Lee
Di He
RALM
263
117
0
14 Nov 2023
Towards the Law of Capacity Gap in Distilling Language Models
Annual Meeting of the Association for Computational Linguistics (ACL), 2023
Chen Zhang
Qiuchi Li
Dawei Song
Zheyu Ye
Yan Gao
Yan Hu
ELM
300
30
0
13 Nov 2023
Beyond Size: How Gradients Shape Pruning Decisions in Large Language Models
Rocktim Jyoti Das
Mingjie Sun
Liqun Ma
Zhiqiang Shen
VLM
154
23
0
08 Nov 2023
S-LoRA: Serving Thousands of Concurrent LoRA Adapters
Ying Sheng
Shiyi Cao
Dacheng Li
Coleman Hooper
Nicholas Lee
...
Banghua Zhu
Lianmin Zheng
Kurt Keutzer
Joseph E. Gonzalez
Ion Stoica
MoE
255
138
0
06 Nov 2023
Navigating Scaling Laws: Compute Optimality in Adaptive Model Training
International Conference on Machine Learning (ICML), 2023
Sotiris Anagnostidis
Gregor Bachmann
Imanol Schlag
Thomas Hofmann
320
2
0
06 Nov 2023
Divergent Token Metrics: Measuring degradation to prune away LLM components -- and optimize quantization
North American Chapter of the Association for Computational Linguistics (NAACL), 2023
Bjorn Deiseroth
Max Meuer
Nikolas Gritsch
C. Eichenberg
P. Schramowski
Matthias Aßenmacher
Kristian Kersting
62
3
0
02 Nov 2023
SiDA-MoE: Sparsity-Inspired Data-Aware Serving for Efficient and Scalable Large Mixture-of-Experts Models
Conference on Machine Learning and Systems (MLSys), 2023
Zhixu Du
Shiyu Li
Yuhao Wu
Xiangyu Jiang
Jingwei Sun
Qilin Zheng
Yongkai Wu
Ang Li
Hai Helen Li
Yiran Chen
MoE
366
31
0
29 Oct 2023
Deja Vu: Contextual Sparsity for Efficient LLMs at Inference Time
International Conference on Machine Learning (ICML), 2023
Zichang Liu
Jue Wang
Tri Dao
Wanrong Zhu
Binhang Yuan
...
Anshumali Shrivastava
Ce Zhang
Yuandong Tian
Christopher Ré
Beidi Chen
BDL
284
271
0
26 Oct 2023
QMoE: Practical Sub-1-Bit Compression of Trillion-Parameter Models
Elias Frantar
Dan Alistarh
MQ
MoE
208
38
0
25 Oct 2023
CRaSh: Clustering, Removing, and Sharing Enhance Fine-tuning without Full Large Language Model
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Kaiyan Zhang
Ning Ding
Biqing Qi
Xuekai Zhu
Xinwei Long
Bowen Zhou
251
5
0
24 Oct 2023
LoRAShear: Efficient Large Language Model Structured Pruning and Knowledge Recovery
Tianyi Chen
Tianyu Ding
Badal Yadav
Ilya Zharkov
Luming Liang
245
37
0
24 Oct 2023
Towards Robust Pruning: An Adaptive Knowledge-Retention Pruning Strategy for Language Models
Jianwei Li
Qi Lei
Wei Cheng
Dongkuan Xu
KELM
289
6
0
19 Oct 2023
Breaking through Deterministic Barriers: Randomized Pruning Mask Generation and Selection
Jianwei Li
Weizhi Gao
Qi Lei
Dongkuan Xu
312
3
0
19 Oct 2023
NASH: A Simple Unified Framework of Structured Pruning for Accelerating Encoder-Decoder Language Models
Jongwoo Ko
Seungjoon Park
Yujin Kim
Sumyeong Ahn
Du-Seong Chang
Euijai Ahn
SeYoung Yun
241
9
0
16 Oct 2023
One-Shot Sensitivity-Aware Mixed Sparsity Pruning for Large Language Models
Hang Shao
Bei Liu
Bo Xiao
Ke Zeng
Guanglu Wan
Yanmin Qian
236
27
0
14 Oct 2023
QUIK: Towards End-to-End 4-Bit Inference on Generative Large Language Models
Saleh Ashkboos
Ilia Markov
Elias Frantar
Tingxuan Zhong
Xincheng Wang
Jie Ren
Torsten Hoefler
Dan Alistarh
MQ
SyDa
332
35
0
13 Oct 2023
Previous
1
2
3
...
11
12
13
14
Next