Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1910.04732
Cited By
Structured Pruning of Large Language Models
10 October 2019
Ziheng Wang
Jeremy Wohlwend
Tao Lei
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Structured Pruning of Large Language Models"
35 / 35 papers shown
Title
Optimizing Recommendations using Fine-Tuned LLMs
Prabhdeep Cheema
Erhan Guven
OffRL
26
0
0
11 May 2025
FineScope : Precision Pruning for Domain-Specialized Large Language Models Using SAE-Guided Self-Data Cultivation
Chaitali Bhattacharyya
Yeseong Kim
45
0
0
01 May 2025
Hands-On Tutorial: Labeling with LLM and Human-in-the-Loop
Ekaterina Artemova
Akim Tsvigun
Dominik Schlechtweg
Natalia Fedorova
Konstantin Chernyshev
Sergei Tilga
Boris Obmoroshev
SyDa
VLM
107
0
0
28 Jan 2025
Tailored-LLaMA: Optimizing Few-Shot Learning in Pruned LLaMA Models with Task-Specific Prompts
Danyal Aftab
Steven Davy
ALM
49
0
0
10 Jan 2025
CURing Large Models: Compression via CUR Decomposition
Sanghyeon Park
Soo-Mook Moon
38
0
0
08 Jan 2025
MoDeGPT: Modular Decomposition for Large Language Model Compression
Chi-Heng Lin
Shangqian Gao
James Seale Smith
Abhishek Patel
Shikhar Tuli
Yilin Shen
Hongxia Jin
Yen-Chang Hsu
71
6
0
19 Aug 2024
Finding Transformer Circuits with Edge Pruning
Adithya Bhaskar
Alexander Wettig
Dan Friedman
Danqi Chen
58
16
0
24 Jun 2024
Everybody Prune Now: Structured Pruning of LLMs with only Forward Passes
Lucio Dery
Steven Kolawole
Jean-Francois Kagey
Virginia Smith
Graham Neubig
Ameet Talwalkar
39
28
0
08 Feb 2024
Matrix Compression via Randomized Low Rank and Low Precision Factorization
R. Saha
Varun Srivastava
Mert Pilanci
13
19
0
17 Oct 2023
Accurate Retraining-free Pruning for Pretrained Encoder-based Language Models
Seungcheol Park
Ho-Jin Choi
U. Kang
VLM
25
5
0
07 Aug 2023
An Overview on Language Models: Recent Developments and Outlook
Chengwei Wei
Yun Cheng Wang
Bin Wang
C.-C. Jay Kuo
17
41
0
10 Mar 2023
HomoDistil: Homotopic Task-Agnostic Distillation of Pre-trained Transformers
Chen Liang
Haoming Jiang
Zheng Li
Xianfeng Tang
Bin Yin
Tuo Zhao
VLM
16
24
0
19 Feb 2023
Revisiting Offline Compression: Going Beyond Factorization-based Methods for Transformer Language Models
Mohammadreza Banaei
Klaudia Bałazy
Artur Kasymov
R. Lebret
Jacek Tabor
Karl Aberer
OffRL
14
0
0
08 Feb 2023
What Matters In The Structured Pruning of Generative Language Models?
Michael Santacroce
Zixin Wen
Yelong Shen
Yuan-Fang Li
18
32
0
07 Feb 2023
Can Open-Domain QA Reader Utilize External Knowledge Efficiently like Humans?
Neeraj Varshney
Man Luo
Chitta Baral
RALM
19
11
0
23 Nov 2022
Compressing Transformer-based self-supervised models for speech processing
Tzu-Quan Lin
Tsung-Huan Yang
Chun-Yao Chang
Kuang-Ming Chen
Tzu-hsun Feng
Hung-yi Lee
Hao Tang
32
6
0
17 Nov 2022
COST-EFF: Collaborative Optimization of Spatial and Temporal Efficiency with Slenderized Multi-exit Language Models
Bowen Shen
Zheng Lin
Yuanxin Liu
Zhengxiao Liu
Lei Wang
Weiping Wang
VLM
33
4
0
27 Oct 2022
Demystifying Map Space Exploration for NPUs
Sheng-Chun Kao
A. Parashar
Po-An Tsai
T. Krishna
30
11
0
07 Oct 2022
Sgap: Towards Efficient Sparse Tensor Algebra Compilation for GPU
Genghan Zhang
Yuetong Zhao
Yanting Tao
Zhongming Yu
Guohao Dai
Sitao Huang
Yuanyuan Wen
Pavlos Petoumenos
Yu Wang
41
4
0
07 Sep 2022
Efficient Methods for Natural Language Processing: A Survey
Marcos Vinícius Treviso
Ji-Ung Lee
Tianchu Ji
Betty van Aken
Qingqing Cao
...
Emma Strubell
Niranjan Balasubramanian
Leon Derczynski
Iryna Gurevych
Roy Schwartz
28
109
0
31 Aug 2022
Spartan: Differentiable Sparsity via Regularized Transportation
Kai Sheng Tai
Taipeng Tian
Ser-Nam Lim
13
11
0
27 May 2022
A Fast Attention Network for Joint Intent Detection and Slot Filling on Edge Devices
Liang Huang
Senjie Liang
Feiyang Ye
Nan Gao
50
4
0
16 May 2022
Accelerating Attention through Gradient-Based Learned Runtime Pruning
Zheng Li
Soroush Ghodrati
Amir Yazdanbakhsh
H. Esmaeilzadeh
Mingu Kang
19
16
0
07 Apr 2022
Pyramid-BERT: Reducing Complexity via Successive Core-set based Token Selection
Xin Huang
A. Khetan
Rene Bidart
Zohar S. Karnin
17
14
0
27 Mar 2022
ZeroGen: Efficient Zero-shot Learning via Dataset Generation
Jiacheng Ye
Jiahui Gao
Qintong Li
Hang Xu
Jiangtao Feng
Zhiyong Wu
Tao Yu
Lingpeng Kong
SyDa
43
211
0
16 Feb 2022
Fast Monte-Carlo Approximation of the Attention Mechanism
Hyunjun Kim
Jeonggil Ko
17
2
0
30 Jan 2022
Efficient Neural Network Training via Forward and Backward Propagation Sparsification
Xiao Zhou
Weizhong Zhang
Zonghao Chen
Shizhe Diao
Tong Zhang
24
46
0
10 Nov 2021
Layer-wise Model Pruning based on Mutual Information
Chun Fan
Jiwei Li
Xiang Ao
Fei Wu
Yuxian Meng
Xiaofei Sun
38
19
0
28 Aug 2021
FLAT: An Optimized Dataflow for Mitigating Attention Bottlenecks
Sheng-Chun Kao
Suvinay Subramanian
Gaurav Agrawal
Amir Yazdanbakhsh
T. Krishna
30
57
0
13 Jul 2021
Effective Sparsification of Neural Networks with Global Sparsity Constraint
Xiao Zhou
Weizhong Zhang
Hang Xu
Tong Zhang
11
61
0
03 May 2021
Rethinking Network Pruning -- under the Pre-train and Fine-tune Paradigm
Dongkuan Xu
Ian En-Hsu Yen
Jinxi Zhao
Zhibin Xiao
VLM
AAML
28
55
0
18 Apr 2021
The NLP Cookbook: Modern Recipes for Transformer based Deep Learning Architectures
Sushant Singh
A. Mahmood
AI4TS
55
92
0
23 Mar 2021
Movement Pruning: Adaptive Sparsity by Fine-Tuning
Victor Sanh
Thomas Wolf
Alexander M. Rush
13
466
0
15 May 2020
Q-BERT: Hessian Based Ultra Low Precision Quantization of BERT
Sheng Shen
Zhen Dong
Jiayu Ye
Linjian Ma
Z. Yao
A. Gholami
Michael W. Mahoney
Kurt Keutzer
MQ
225
574
0
12 Sep 2019
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding
Alex Jinpeng Wang
Amanpreet Singh
Julian Michael
Felix Hill
Omer Levy
Samuel R. Bowman
ELM
294
6,950
0
20 Apr 2018
1