ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2002.08307
  4. Cited By
Compressing BERT: Studying the Effects of Weight Pruning on Transfer
  Learning
v1v2 (latest)

Compressing BERT: Studying the Effects of Weight Pruning on Transfer Learning

Workshop on Representation Learning for NLP (RepL4NLP), 2020
19 February 2020
Mitchell A. Gordon
Kevin Duh
Nicholas Andrews
    VLM
ArXiv (abs)PDFHTML

Papers citing "Compressing BERT: Studying the Effects of Weight Pruning on Transfer Learning"

50 / 195 papers shown
CatBack: Universal Backdoor Attacks on Tabular Data via Categorical Encoding
CatBack: Universal Backdoor Attacks on Tabular Data via Categorical Encoding
Behrad Tajalli
Stefanos Koffas
S. Picek
AAML
127
0
0
08 Nov 2025
A Metamorphic Testing Perspective on Knowledge Distillation for Language Models of Code: Does the Student Deeply Mimic the Teacher?
A Metamorphic Testing Perspective on Knowledge Distillation for Language Models of Code: Does the Student Deeply Mimic the Teacher?
Md. Abdul Awal
Mrigank Rochan
Chanchal K. Roy
214
0
0
07 Nov 2025
Efficient Adaptive Transformer: An Empirical Study and Reproducible Framework
Efficient Adaptive Transformer: An Empirical Study and Reproducible Framework
Jan Miller
125
0
0
14 Oct 2025
Entropy Meets Importance: A Unified Head Importance-Entropy Score for Stable and Efficient Transformer Pruning
Entropy Meets Importance: A Unified Head Importance-Entropy Score for Stable and Efficient Transformer Pruning
Minsik Choi
Hyegang Son
Changhoon Kim
Young Geun Kim
AAML
149
0
0
10 Oct 2025
A Second-Order Perspective on Pruning at Initialization and Knowledge Transfer
A Second-Order Perspective on Pruning at Initialization and Knowledge Transfer
Leonardo Iurada
Beatrice Occhiena
Tatiana Tommasi
VLM
170
0
0
28 Sep 2025
Assortment of Attention Heads: Accelerating Federated PEFT with Head Pruning and Strategic Client Selection
Assortment of Attention Heads: Accelerating Federated PEFT with Head Pruning and Strategic Client Selection
Yeshwanth Venkatesha
Souvik Kundu
Priyadarshini Panda
172
1
0
31 May 2025
Generative Artificial Intelligence for Internet of Things Computing: A Systematic Survey
Generative Artificial Intelligence for Internet of Things Computing: A Systematic Survey
Fabrizio Mangione
Claudio Savaglio
Giancarlo Fortino
266
6
0
10 Apr 2025
As easy as PIE: understanding when pruning causes language models to disagree
As easy as PIE: understanding when pruning causes language models to disagreeNorth American Chapter of the Association for Computational Linguistics (NAACL), 2025
Pietro Tropeano
Maria Maistro
Tuukka Ruotsalo
Christina Lioma
259
0
0
27 Mar 2025
Moss: Proxy Model-based Full-Weight Aggregation in Federated Learning with Heterogeneous ModelsProceedings of the ACM on Interactive Mobile Wearable and Ubiquitous Technologies (IMWUT), 2025
Y. Cai
Ziqi Zhang
Ding Li
Yao Guo
Xiangqun Chen
497
0
0
13 Mar 2025
Signal Collapse in One-Shot Pruning: When Sparse Models Fail to Distinguish Neural Representations
Signal Collapse in One-Shot Pruning: When Sparse Models Fail to Distinguish Neural Representations
Dhananjay Saikumar
Blesson Varghese
221
2
0
18 Feb 2025
On the Compression of Language Models for Code: An Empirical Study on
  CodeBERT
On the Compression of Language Models for Code: An Empirical Study on CodeBERTIEEE International Conference on Software Analysis, Evolution, and Reengineering (SANER), 2024
Giordano dÁloisio
Luca Traini
Federica Sarro
A. Marco
245
7
0
18 Dec 2024
SoftLMs: Efficient Adaptive Low-Rank Approximation of Language Models
  using Soft-Thresholding Mechanism
SoftLMs: Efficient Adaptive Low-Rank Approximation of Language Models using Soft-Thresholding Mechanism
Priyansh Bhatnagar
Linfeng Wen
Mingu Kang
187
0
0
15 Nov 2024
Exploring the Benefit of Activation Sparsity in Pre-training
Exploring the Benefit of Activation Sparsity in Pre-trainingInternational Conference on Machine Learning (ICML), 2024
Zhengyan Zhang
Chaojun Xiao
Qiujieli Qin
Yankai Lin
Zhiyuan Zeng
Xu Han
Zhiyuan Liu
Ruobing Xie
Maosong Sun
Jie Zhou
MoE
255
6
0
04 Oct 2024
Exploiting Student Parallelism for Efficient GPU Inference of BERT-like Models in Online Services
Exploiting Student Parallelism for Efficient GPU Inference of BERT-like Models in Online Services
Weiyan Wang
Yilun Jin
Yiming Zhang
Victor Junqiu Wei
Han Tian
Li Chen
Jinbao Xue
Yangyu Tao
Di Wang
Kai Chen
256
0
0
22 Aug 2024
Cross-layer Attention Sharing for Pre-trained Large Language Models
Cross-layer Attention Sharing for Pre-trained Large Language Models
Yongyu Mu
Yuzhang Wu
Yuchun Fan
Chenglong Wang
Hengyu Li
...
Murun Yang
Fandong Meng
Jie Zhou
Tong Xiao
Jingbo Zhu
300
5
0
04 Aug 2024
Greedy Output Approximation: Towards Efficient Structured Pruning for
  LLMs Without Retraining
Greedy Output Approximation: Towards Efficient Structured Pruning for LLMs Without Retraining
Jianwei Li
Yijun Dong
Qi Lei
378
9
0
26 Jul 2024
A Complete Survey on LLM-based AI Chatbots
A Complete Survey on LLM-based AI Chatbots
Sumit Kumar Dam
Choong Seon Hong
Yu Qiao
Chaoning Zhang
300
140
0
17 Jun 2024
Understanding Token Probability Encoding in Output Embeddings
Understanding Token Probability Encoding in Output Embeddings
Hakaze Cho
Yoshihiro Sakai
Kenshiro Tanaka
Mariko Kato
Naoya Inoue
301
3
0
03 Jun 2024
Sparsity-Accelerated Training for Large Language Models
Sparsity-Accelerated Training for Large Language Models
Da Ma
Lu Chen
Pengyu Wang
Hongshen Xu
Hanqi Li
Liangtai Sun
Su Zhu
Shuai Fan
Kai Yu
LRM
162
3
0
03 Jun 2024
Reward-based Input Construction for Cross-document Relation Extraction
Reward-based Input Construction for Cross-document Relation Extraction
Byeonghu Na
Suhyeon Jo
Yeongmin Kim
Il-Chul Moon
158
2
0
31 May 2024
Switchable Decision: Dynamic Neural Generation Networks
Switchable Decision: Dynamic Neural Generation Networks
Shujian Zhang
Korawat Tanwisuth
Chengyue Gong
Pengcheng He
Mi Zhou
BDL
232
0
0
07 May 2024
SPAFIT: Stratified Progressive Adaptation Fine-tuning for Pre-trained
  Large Language Models
SPAFIT: Stratified Progressive Adaptation Fine-tuning for Pre-trained Large Language Models
Samir Arora
Liangliang Wang
128
1
0
30 Apr 2024
MULTIFLOW: Shifting Towards Task-Agnostic Vision-Language Pruning
MULTIFLOW: Shifting Towards Task-Agnostic Vision-Language Pruning
Matteo Farina
Goran Frehse
Elia Cunegatti
Gaowen Liu
Giovanni Iacca
Elisa Ricci
VLM
291
8
0
08 Apr 2024
LayerNorm: A key component in parameter-efficient fine-tuning
LayerNorm: A key component in parameter-efficient fine-tuning
Taha ValizadehAslani
Hualou Liang
271
5
0
29 Mar 2024
SEVEN: Pruning Transformer Model by Reserving Sentinels
SEVEN: Pruning Transformer Model by Reserving SentinelsIEEE International Joint Conference on Neural Network (IJCNN), 2024
Jinying Xiao
Ping Li
Jie Nie
Zhe Tang
205
3
0
19 Mar 2024
MADTP: Multimodal Alignment-Guided Dynamic Token Pruning for
  Accelerating Vision-Language Transformer
MADTP: Multimodal Alignment-Guided Dynamic Token Pruning for Accelerating Vision-Language Transformer
Jianjian Cao
Peng Ye
Shengze Li
Chong Yu
Yansong Tang
Jiwen Lu
Tao Chen
233
47
0
05 Mar 2024
Model Compression and Efficient Inference for Large Language Models: A
  Survey
Model Compression and Efficient Inference for Large Language Models: A Survey
Wenxiao Wang
Wei Chen
Yicong Luo
Yongliu Long
Zhengkai Lin
Liye Zhang
Binbin Lin
Deng Cai
Xiaofei He
MQ
325
90
0
15 Feb 2024
Less is KEN: a Universal and Simple Non-Parametric Pruning Algorithm for
  Large Language Models
Less is KEN: a Universal and Simple Non-Parametric Pruning Algorithm for Large Language ModelsAnnual Meeting of the Association for Computational Linguistics (ACL), 2024
Michele Mastromattei
Fabio Massimo Zanzotto
VLM
256
3
0
05 Feb 2024
DE$^3$-BERT: Distance-Enhanced Early Exiting for BERT based on Prototypical Networks
DE3^33-BERT: Distance-Enhanced Early Exiting for BERT based on Prototypical Networks
Jianing He
Tao Gui
Weiping Ding
Duoqian Miao
Jun Zhao
Liang Hu
LongBing Cao
214
6
0
03 Feb 2024
Understanding LLMs: A Comprehensive Overview from Training to Inference
Understanding LLMs: A Comprehensive Overview from Training to Inference
Yi-Hsueh Liu
Haoyang He
Tianle Han
Xu-Yao Zhang
Mengyuan Liu
...
Xiaoyan Cai
Tuo Zhang
Ning Qiang
Tianming Liu
Bao Ge
SyDa
470
135
0
04 Jan 2024
DSFormer: Effective Compression of Text-Transformers by Dense-Sparse
  Weight Factorization
DSFormer: Effective Compression of Text-Transformers by Dense-Sparse Weight Factorization
Rahul Chand
Yashoteja Prabhu
Pratyush Kumar
201
5
0
20 Dec 2023
BiPFT: Binary Pre-trained Foundation Transformer with Low-rank
  Estimation of Binarization Residual Polynomials
BiPFT: Binary Pre-trained Foundation Transformer with Low-rank Estimation of Binarization Residual PolynomialsAAAI Conference on Artificial Intelligence (AAAI), 2023
Xingrun Xing
Li Du
Xinyuan Wang
Xianlin Zeng
Yequan Wang
Zheng Zhang
Jiajun Zhang
247
5
0
14 Dec 2023
Large Multimodal Model Compression via Efficient Pruning and
  Distillation at AntGroup
Large Multimodal Model Compression via Efficient Pruning and Distillation at AntGroup
Xinjian Zhao
Yao-Min Zhao
Jiajia Liu
Jingdong Chen
Chenyi Zhuang
Jinjie Gu
Ruocheng Guo
Xiangyu Zhao
165
10
0
10 Dec 2023
The Cost of Compression: Investigating the Impact of Compression on
  Parametric Knowledge in Language Models
The Cost of Compression: Investigating the Impact of Compression on Parametric Knowledge in Language ModelsConference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Srinath Namburi
Makesh Narsimhan Sreedhar
Srinath Srinivasan
Frederic Sala
MQ
261
15
0
01 Dec 2023
DEED: Dynamic Early Exit on Decoder for Accelerating Encoder-Decoder
  Transformer Models
DEED: Dynamic Early Exit on Decoder for Accelerating Encoder-Decoder Transformer Models
Peng Tang
Pengkai Zhu
Tian Li
Srikar Appalaraju
Vijay Mahadevan
R. Manmatha
259
9
0
15 Nov 2023
EELBERT: Tiny Models through Dynamic Embeddings
EELBERT: Tiny Models through Dynamic EmbeddingsConference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Gabrielle Cohn
Rishika Agarwal
Deepanshu Gupta
Siddharth Patwardhan
175
3
0
31 Oct 2023
MOSEL: Inference Serving Using Dynamic Modality Selection
MOSEL: Inference Serving Using Dynamic Modality SelectionConference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Bodun Hu
Le Xu
Jeongyoon Moon
N. Yadwadkar
Aditya Akella
313
5
0
27 Oct 2023
Outlier Dimensions Encode Task-Specific Knowledge
Outlier Dimensions Encode Task-Specific KnowledgeConference on Empirical Methods in Natural Language Processing (EMNLP), 2023
William Rudman
Catherine Chen
Carsten Eickhoff
346
9
0
26 Oct 2023
Retrieval-based Knowledge Transfer: An Effective Approach for Extreme
  Large Language Model Compression
Retrieval-based Knowledge Transfer: An Effective Approach for Extreme Large Language Model CompressionConference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Jiduan Liu
Jiahao Liu
Qifan Wang
Jingang Wang
Xunliang Cai
Dongyan Zhao
Ran Wang
Rui Yan
228
7
0
24 Oct 2023
CRaSh: Clustering, Removing, and Sharing Enhance Fine-tuning without
  Full Large Language Model
CRaSh: Clustering, Removing, and Sharing Enhance Fine-tuning without Full Large Language ModelConference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Kaiyan Zhang
Ning Ding
Biqing Qi
Xuekai Zhu
Xinwei Long
Bowen Zhou
282
5
0
24 Oct 2023
Towards Robust Pruning: An Adaptive Knowledge-Retention Pruning Strategy
  for Language Models
Towards Robust Pruning: An Adaptive Knowledge-Retention Pruning Strategy for Language Models
Jianwei Li
Qi Lei
Wei Cheng
Dongkuan Xu
KELM
333
7
0
19 Oct 2023
Breaking through Deterministic Barriers: Randomized Pruning Mask
  Generation and Selection
Breaking through Deterministic Barriers: Randomized Pruning Mask Generation and Selection
Jianwei Li
Weizhi Gao
Qi Lei
Dongkuan Xu
364
3
0
19 Oct 2023
Pit One Against Many: Leveraging Attention-head Embeddings for
  Parameter-efficient Multi-head Attention
Pit One Against Many: Leveraging Attention-head Embeddings for Parameter-efficient Multi-head AttentionConference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Huiyin Xue
Nikolaos Aletras
365
1
0
11 Oct 2023
Compresso: Structured Pruning with Collaborative Prompting Learns
  Compact Large Language Models
Compresso: Structured Pruning with Collaborative Prompting Learns Compact Large Language Models
Song Guo
Jiahang Xu
Li Zhang
Mao Yang
283
18
0
08 Oct 2023
Neural Language Model Pruning for Automatic Speech Recognition
Neural Language Model Pruning for Automatic Speech Recognition
Leonardo Emili
Thiago Fraga-Silva
Ernest Pusateri
M. Nußbaum-Thom
Youssef Oualil
240
3
0
05 Oct 2023
Mitigating Shortcuts in Language Models with Soft Label Encoding
Mitigating Shortcuts in Language Models with Soft Label EncodingInternational Conference on Language Resources and Evaluation (LREC), 2023
Zirui He
Huiqi Deng
Haiyan Zhao
Ninghao Liu
Jundong Li
199
2
0
17 Sep 2023
A Survey on Model Compression for Large Language Models
A Survey on Model Compression for Large Language ModelsTransactions of the Association for Computational Linguistics (TACL), 2023
Xunyu Zhu
Jian Li
Yong Liu
Can Ma
Weiping Wang
388
384
0
15 Aug 2023
SLoRA: Federated Parameter Efficient Fine-Tuning of Language Models
SLoRA: Federated Parameter Efficient Fine-Tuning of Language Models
Sara Babakniya
A. Elkordy
Yahya H. Ezzeldin
Qingfeng Liu
Kee-Bong Song
Mostafa El-Khamy
Salman Avestimehr
191
115
0
12 Aug 2023
DPBERT: Efficient Inference for BERT based on Dynamic Planning
DPBERT: Efficient Inference for BERT based on Dynamic PlanningEuropean Conference on Artificial Intelligence (ECAI), 2023
Weixin Wu
H. Zhuo
115
1
0
26 Jul 2023
Learned Thresholds Token Merging and Pruning for Vision Transformers
Learned Thresholds Token Merging and Pruning for Vision Transformers
Maxim Bonnaerens
J. Dambre
287
30
0
20 Jul 2023
1234
Next
Page 1 of 4