Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2302.03773
Cited By
What Matters In The Structured Pruning of Generative Language Models?
7 February 2023
Michael Santacroce
Zixin Wen
Yelong Shen
Yuan-Fang Li
Re-assign community
ArXiv
PDF
HTML
Papers citing
"What Matters In The Structured Pruning of Generative Language Models?"
25 / 25 papers shown
Title
A Sliding Layer Merging Method for Efficient Depth-Wise Pruning in LLMs
Xuan Ding
Rui Sun
Yunjian Zhang
Xiu Yan
Yueqi Zhou
Kaihao Huang
Suzhong Fu
Chuanlong Xie
Yao Zhu
55
1
0
26 Feb 2025
When Compression Meets Model Compression: Memory-Efficient Double Compression for Large Language Models
Weilan Wang
Yu Mao
Dongdong Tang
Hongchao Du
Nan Guan
Chun Jason Xue
MQ
55
1
0
24 Feb 2025
Extracting Interpretable Task-Specific Circuits from Large Language Models for Faster Inference
Jorge García-Carrasco
A. Maté
Juan Trujillo
71
0
0
20 Dec 2024
QPruner: Probabilistic Decision Quantization for Structured Pruning in Large Language Models
Changhai Zhou
Yuhua Zhou
Shijie Han
Qian Qiao
Hongguang Li
MQ
72
0
0
16 Dec 2024
CPTQuant -- A Novel Mixed Precision Post-Training Quantization Techniques for Large Language Models
Amitash Nanda
Sree Bhargavi Balija
D. Sahoo
MQ
59
0
0
03 Dec 2024
AutoMixQ: Self-Adjusting Quantization for High Performance Memory-Efficient Fine-Tuning
Changhai Zhou
Shiyang Zhang
Yuhua Zhou
Zekai Liu
Shichao Weng
MQ
59
0
0
21 Nov 2024
Cross-Domain Content Generation with Domain-Specific Small Language Models
Ankit Maloo
Abhinav Garg
CLL
22
0
0
19 Sep 2024
Mixed Sparsity Training: Achieving 4
×
\times
×
FLOP Reduction for Transformer Pretraining
Pihe Hu
Shaolong Li
Longbo Huang
16
0
0
21 Aug 2024
Inference Optimizations for Large Language Models: Effects, Challenges, and Practical Considerations
Leo Donisch
Sigurd Schacht
Carsten Lanquillon
14
2
0
06 Aug 2024
RankAdaptor: Hierarchical Dynamic Low-Rank Adaptation for Structural Pruned LLMs
Changhai Zhou
Shijie Han
Shiyang Zhang
Shichao Weng
Zekai Liu
Cheng Jin
16
1
0
22 Jun 2024
Exploring Activation Patterns of Parameters in Language Models
Yudong Wang
Damai Dai
Zhifang Sui
19
1
0
28 May 2024
Model Compression and Efficient Inference for Large Language Models: A Survey
Wenxiao Wang
Wei Chen
Yicong Luo
Yongliu Long
Zhengkai Lin
Liye Zhang
Binbin Lin
Deng Cai
Xiaofei He
MQ
36
30
0
15 Feb 2024
A Survey on Transformer Compression
Yehui Tang
Yunhe Wang
Jianyuan Guo
Zhijun Tu
Kai Han
Hailin Hu
Dacheng Tao
24
26
0
05 Feb 2024
Shortened LLaMA: Depth Pruning for Large Language Models with Comparison of Retraining Methods
Bo-Kyeong Kim
Geonmin Kim
Tae-Ho Kim
Thibault Castells
Shinkook Choi
Junho Shin
Hyoung-Kyu Song
49
28
0
05 Feb 2024
Towards Efficient Generative Large Language Model Serving: A Survey from Algorithms to Systems
Xupeng Miao
Gabriele Oliaro
Zhihao Zhang
Xinhao Cheng
Hongyi Jin
Tianqi Chen
Zhihao Jia
48
75
0
23 Dec 2023
Efficient LLM inference solution on Intel GPU
Hui Wu
Yi Gan
Feng Yuan
Jing Ma
Wei Zhu
...
Hong Zhu
Yuhua Zhu
Xiaoli Liu
Jinghui Gu
Peng Zhao
10
3
0
19 Dec 2023
NASH: A Simple Unified Framework of Structured Pruning for Accelerating Encoder-Decoder Language Models
Jongwoo Ko
Seungjoon Park
Yujin Kim
Sumyeong Ahn
Du-Seong Chang
Euijai Ahn
SeYoung Yun
9
4
0
16 Oct 2023
The Cost of Down-Scaling Language Models: Fact Recall Deteriorates before In-Context Learning
Tian Jin
Nolan Clement
Xin Dong
Vaishnavh Nagarajan
Michael Carbin
Jonathan Ragan-Kelley
Gintare Karolina Dziugaite
LRM
27
5
0
07 Oct 2023
ReLU Strikes Back: Exploiting Activation Sparsity in Large Language Models
Iman Mirzadeh
Keivan Alizadeh-Vahid
Sachin Mehta
C. C. D. Mundo
Oncel Tuzel
Golnoosh Samei
Mohammad Rastegari
Mehrdad Farajtabar
118
58
0
06 Oct 2023
A Survey on Model Compression for Large Language Models
Xunyu Zhu
Jian Li
Yong Liu
Can Ma
Weiping Wang
13
98
0
15 Aug 2023
TinyStories: How Small Can Language Models Be and Still Speak Coherent English?
Ronen Eldan
Yuan-Fang Li
SyDa
LRM
10
232
0
12 May 2023
Analyzing Feed-Forward Blocks in Transformers through the Lens of Attention Maps
Goro Kobayashi
Tatsuki Kuribayashi
Sho Yokoi
Kentaro Inui
15
14
0
01 Feb 2023
A Short Study on Compressing Decoder-Based Language Models
Tianda Li
Yassir El Mesbahi
I. Kobyzev
Ahmad Rashid
A. Mahmud
Nithin Anchuri
Habib Hajimolahoseini
Yang Liu
Mehdi Rezagholizadeh
76
25
0
16 Oct 2021
What is the State of Neural Network Pruning?
Davis W. Blalock
Jose Javier Gonzalez Ortiz
Jonathan Frankle
John Guttag
172
1,018
0
06 Mar 2020
Language Models as Knowledge Bases?
Fabio Petroni
Tim Rocktaschel
Patrick Lewis
A. Bakhtin
Yuxiang Wu
Alexander H. Miller
Sebastian Riedel
KELM
AI4MH
396
2,576
0
03 Sep 2019
1