Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2401.15024
Cited By
SliceGPT: Compress Large Language Models by Deleting Rows and Columns
26 January 2024
Saleh Ashkboos
Maximilian L. Croci
Marcelo Gennari do Nascimento
Torsten Hoefler
James Hensman
VLM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"SliceGPT: Compress Large Language Models by Deleting Rows and Columns"
30 / 30 papers shown
Title
SPAP: Structured Pruning via Alternating Optimization and Penalty Methods
Hanyu Hu
Xiaoming Yuan
29
0
0
06 May 2025
FineScope : Precision Pruning for Domain-Specialized Large Language Models Using SAE-Guided Self-Data Cultivation
Chaitali Bhattacharyya
Yeseong Kim
38
0
0
01 May 2025
Efficient LLMs with AMP: Attention Heads and MLP Pruning
Leandro Giusti Mugnaini
Bruno Yamamoto
Lucas Lauton de Alcantara
Victor Zacarias
Edson Bollis
Lucas Pellicer
A. H. R. Costa
Artur Jordao
33
0
0
29 Apr 2025
When Reasoning Meets Compression: Benchmarking Compressed Large Reasoning Models on Complex Reasoning Tasks
Nan Zhang
Yusen Zhang
Prasenjit Mitra
Rui Zhang
MQ
LRM
42
2
0
02 Apr 2025
Adaptive Rank Allocation: Speeding Up Modern Transformers with RaNA Adapters
Roberto Garcia
Jerry Liu
Daniel Sorvisto
Sabri Eyuboglu
76
0
0
23 Mar 2025
Triad: Empowering LMM-based Anomaly Detection with Vision Expert-guided Visual Tokenizer and Manufacturing Process
Yuanze Li
Shihao Yuan
Haolin Wang
Qizhang Li
Ming-Yu Liu
Chen Xu
Guangming Shi
Wangmeng Zuo
46
0
0
17 Mar 2025
How can representation dimension dominate structurally pruned LLMs?
Mingxue Xu
Lisa Alazraki
Danilo P. Mandic
50
0
0
06 Mar 2025
SpinQuant: LLM quantization with learned rotations
Zechun Liu
Changsheng Zhao
Igor Fedorov
Bilge Soran
Dhruv Choudhary
Raghuraman Krishnamoorthi
Vikas Chandra
Yuandong Tian
Tijmen Blankevoort
MQ
113
76
0
21 Feb 2025
DSMoE: Matrix-Partitioned Experts with Dynamic Routing for Computation-Efficient Dense LLMs
Minxuan Lv
Zhenpeng Su
Leiyu Pan
Yizhe Xiong
Zijia Lin
...
Guiguang Ding
Cheng Luo
Di Zhang
Kun Gai
Songlin Hu
MoE
34
0
0
18 Feb 2025
Forget the Data and Fine-Tuning! Just Fold the Network to Compress
Dong Wang
Haris Šikić
Lothar Thiele
O. Saukh
37
0
0
17 Feb 2025
EfficientLLM: Scalable Pruning-Aware Pretraining for Architecture-Agnostic Edge Language Models
Xingrun Xing
Zheng Liu
Shitao Xiao
Boyan Gao
Yiming Liang
Wanpeng Zhang
Haokun Lin
Guoqi Li
Jiajun Zhang
LRM
47
1
0
10 Feb 2025
Mamba-Shedder: Post-Transformer Compression for Efficient Selective Structured State Space Models
J. P. Muñoz
Jinjie Yuan
Nilesh Jain
Mamba
63
1
0
28 Jan 2025
Merging Feed-Forward Sublayers for Compressed Transformers
Neha Verma
Kenton W. Murray
Kevin Duh
AI4CE
36
0
0
10 Jan 2025
CURing Large Models: Compression via CUR Decomposition
Sanghyeon Park
Soo-Mook Moon
28
0
0
08 Jan 2025
GQSA: Group Quantization and Sparsity for Accelerating Large Language Model Inference
Chao Zeng
Songwei Liu
Shu Yang
Fangmin Chen
Xing Mei
Lean Fu
MQ
33
0
0
23 Dec 2024
Puzzle: Distillation-Based NAS for Inference-Optimized LLMs
Akhiad Bercovich
Tomer Ronen
Talor Abramovich
Nir Ailon
Nave Assaf
...
Ido Shahaf
Oren Tropp
Omer Ullman Argov
Ran Zilberstein
Ran El-Yaniv
67
1
0
28 Nov 2024
Zeroth-Order Adaptive Neuron Alignment Based Pruning without Re-Training
Elia Cunegatti
Leonardo Lucio Custode
Giovanni Iacca
31
0
0
11 Nov 2024
EoRA: Training-free Compensation for Compressed LLM with Eigenspace Low-Rank Approximation
Shih-yang Liu
Huck Yang
Nai Chit Fung
Nai Chit Fung
Hongxu Yin
...
Jan Kautz
Yu-Chun Wang
Pavlo Molchanov
Min-Hung Chen
Min-Hung Chen
MQ
24
0
0
28 Oct 2024
OATS: Outlier-Aware Pruning Through Sparse and Low Rank Decomposition
Stephen Zhang
V. Papyan
VLM
30
1
0
20 Sep 2024
HESSO: Towards Automatic Efficient and User Friendly Any Neural Network Training and Pruning
Tianyi Chen
Xiaoyi Qu
David Aponte
Colby R. Banbury
Jongwoo Ko
Tianyu Ding
Yong Ma
Vladimir Lyapunov
Ilya Zharkov
Luming Liang
64
1
0
11 Sep 2024
MoDeGPT: Modular Decomposition for Large Language Model Compression
Chi-Heng Lin
Shangqian Gao
James Seale Smith
Abhishek Patel
Shikhar Tuli
Yilin Shen
Hongxia Jin
Yen-Chang Hsu
65
6
0
19 Aug 2024
A deeper look at depth pruning of LLMs
Shoaib Ahmed Siddiqui
Xin Dong
Greg Heinrich
Thomas Breuel
Jan Kautz
David M. Krueger
Pavlo Molchanov
18
7
0
23 Jul 2024
Compact Language Models via Pruning and Knowledge Distillation
Saurav Muralidharan
Sharath Turuvekere Sreenivas
Raviraj Joshi
Marcin Chochowski
M. Patwary
M. Shoeybi
Bryan Catanzaro
Jan Kautz
Pavlo Molchanov
SyDa
MQ
20
36
0
19 Jul 2024
BitNet b1.58 Reloaded: State-of-the-art Performance Also on Smaller Networks
Jacob Nielsen
Peter Schneider-Kamp
MQ
35
4
0
24 Jun 2024
SVD-LLM: Truncation-aware Singular Value Decomposition for Large Language Model Compression
Xin Wang
Yu Zheng
Zhongwei Wan
Mi Zhang
MQ
45
43
0
12 Mar 2024
Why Lift so Heavy? Slimming Large Language Models by Cutting Off the Layers
Shuzhou Yuan
Ercong Nie
Bolei Ma
Michael Farber
23
2
0
18 Feb 2024
On the Efficacy of Eviction Policy for Key-Value Constrained Generative Language Model Inference
Siyu Ren
Kenny Q. Zhu
8
27
0
09 Feb 2024
The LLM Surgeon
Tycho F. A. van der Ouderaa
Markus Nagel
M. V. Baalen
Yuki Markus Asano
Tijmen Blankevoort
11
14
0
28 Dec 2023
QUIK: Towards End-to-End 4-Bit Inference on Generative Large Language Models
Saleh Ashkboos
Ilia Markov
Elias Frantar
Tingxuan Zhong
Xincheng Wang
Jie Ren
Torsten Hoefler
Dan Alistarh
MQ
SyDa
109
21
0
13 Oct 2023
Sparsity in Deep Learning: Pruning and growth for efficient inference and training in neural networks
Torsten Hoefler
Dan Alistarh
Tal Ben-Nun
Nikoli Dryden
Alexandra Peste
MQ
128
679
0
31 Jan 2021
1