ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2002.02925
  4. Cited By
BERT-of-Theseus: Compressing BERT by Progressive Module Replacing
v1v2v3v4 (latest)

BERT-of-Theseus: Compressing BERT by Progressive Module Replacing

Conference on Empirical Methods in Natural Language Processing (EMNLP), 2020
7 February 2020
Canwen Xu
Wangchunshu Zhou
Tao Ge
Furu Wei
Ming Zhou
ArXiv (abs)PDFHTML

Papers citing "BERT-of-Theseus: Compressing BERT by Progressive Module Replacing"

50 / 102 papers shown
Network of Theseus (like the ship)
Network of Theseus (like the ship)
Vighnesh Subramaniam
C. Conwell
Boris Katz
Andrei Barbu
Brian Cheung
152
0
0
03 Dec 2025
Deterministic Continuous Replacement: Fast and Stable Module Replacement in Pretrained Transformers
Deterministic Continuous Replacement: Fast and Stable Module Replacement in Pretrained Transformers
Rowan Bradbury
Aniket Srinivasan Ashok
Sai Ram Kasanagottu
Gunmay Jhingran
Shuai Meng
186
0
0
24 Nov 2025
A Metamorphic Testing Perspective on Knowledge Distillation for Language Models of Code: Does the Student Deeply Mimic the Teacher?
A Metamorphic Testing Perspective on Knowledge Distillation for Language Models of Code: Does the Student Deeply Mimic the Teacher?
Md. Abdul Awal
Mrigank Rochan
Chanchal K. Roy
248
1
0
07 Nov 2025
Improving LLM Reasoning via Dependency-Aware Query Decomposition and Logic-Parallel Content Expansion
Improving LLM Reasoning via Dependency-Aware Query Decomposition and Logic-Parallel Content Expansion
Xianjun Gao
Jianchun Liu
Hongli Xu
Liusheng Huang
LRM
130
0
0
28 Oct 2025
SQS: Bayesian DNN Compression through Sparse Quantized Sub-distributions
SQS: Bayesian DNN Compression through Sparse Quantized Sub-distributions
Ziyi Wang
Nan Jiang
Guang Lin
Qifan Song
MQ
254
0
0
10 Oct 2025
CoSpaDi: Compressing LLMs via Calibration-Guided Sparse Dictionary Learning
CoSpaDi: Compressing LLMs via Calibration-Guided Sparse Dictionary Learning
Dmitriy Shopkhoev
Denis Makhov
Magauiya Zhussip
Ammar Ali
Stamatios Lefkimmiatis
246
3
0
26 Sep 2025
When Long Helps Short: How Context Length in Supervised Fine-tuning Affects Behavior of Large Language Models
When Long Helps Short: How Context Length in Supervised Fine-tuning Affects Behavior of Large Language Models
Yingming Zheng
Hanqi Li
Kai Yu
Lu Chen
320
0
0
23 Sep 2025
An Empirical Study of Knowledge Distillation for Code Understanding Tasks
An Empirical Study of Knowledge Distillation for Code Understanding Tasks
Ruiqi Wang
Zezhou Yang
Cuiyun Gao
Xin Xia
Qing Liao
183
2
0
21 Aug 2025
Computational Economics in Large Language Models: Exploring Model Behavior and Incentive Design under Resource Constraints
Computational Economics in Large Language Models: Exploring Model Behavior and Incentive Design under Resource Constraints
Sandeep Reddy
Kabir Khan
Rohit Patil
Ananya Chakraborty
Faizan A. Khan
Swati Kulkarni
Arjun Verma
Neha Singh
240
1
0
14 Aug 2025
General Compression Framework for Efficient Transformer Object Tracking
General Compression Framework for Efficient Transformer Object Tracking
Lingyi Hong
Jinglun Li
Xinyu Zhou
Shilin Yan
Pinxue Guo
...
Runze Li
Xingdong Sheng
Wei Zhang
Hong Lu
Wenqiang Zhang
ViT
371
4
0
01 Jul 2025
Towards a Small Language Model Lifecycle Framework
Towards a Small Language Model Lifecycle Framework
Parsa Miraghaei
Sergio Moreschini
Antti Kolehmainen
David Hästbacka
208
0
0
09 Jun 2025
EPEE: Towards Efficient and Effective Foundation Models in Biomedicine
EPEE: Towards Efficient and Effective Foundation Models in Biomedicine
Zaifu Zhan
Shuang Zhou
Huixue Zhou
Ziqiang Liu
Rui Zhang
275
1
0
03 Mar 2025
Data-adaptive Differentially Private Prompt Synthesis for In-Context Learning
Data-adaptive Differentially Private Prompt Synthesis for In-Context LearningInternational Conference on Learning Representations (ICLR), 2024
Fengyu Gao
Ruida Zhou
T. Wang
Cong Shen
Jing Yang
377
8
0
15 Oct 2024
m2mKD: Module-to-Module Knowledge Distillation for Modular Transformers
m2mKD: Module-to-Module Knowledge Distillation for Modular Transformers
Ka Man Lo
Yiming Liang
Wenyu Du
Yuantao Fan
Zili Wang
Wenhao Huang
Lei Ma
Jie Fu
MoE
383
4
0
26 Feb 2024
Everybody Prune Now: Structured Pruning of LLMs with only Forward Passes
Everybody Prune Now: Structured Pruning of LLMs with only Forward Passes
Lucio Dery
Steven Kolawole
Jean-Francois Kagey
Virginia Smith
Graham Neubig
Ameet Talwalkar
402
50
0
08 Feb 2024
A Survey on Transformer Compression
A Survey on Transformer Compression
Yehui Tang
Yunhe Wang
Jianyuan Guo
Zhijun Tu
Kai Han
Hailin Hu
Dacheng Tao
584
73
0
05 Feb 2024
DE$^3$-BERT: Distance-Enhanced Early Exiting for BERT based on Prototypical Networks
DE3^33-BERT: Distance-Enhanced Early Exiting for BERT based on Prototypical Networks
Jianing He
Tao Gui
Weiping Ding
Duoqian Miao
Jun Zhao
Liang Hu
LongBing Cao
256
6
0
03 Feb 2024
BPDec: Unveiling the Potential of Masked Language Modeling Decoder in
  BERT pretraining
BPDec: Unveiling the Potential of Masked Language Modeling Decoder in BERT pretrainingInternational Conference on Neural Information Processing (ICONIP), 2024
Wen-Chieh Liang
Youzhi Liang
OffRL
176
2
0
29 Jan 2024
Grounding Foundation Models through Federated Transfer Learning: A
  General Framework
Grounding Foundation Models through Federated Transfer Learning: A General FrameworkACM Transactions on Intelligent Systems and Technology (ACM TIST), 2023
Weijing Chen
Tao Fan
Hanlin Gu
Xiaojin Zhang
Lixin Fan
Qiang Yang
AI4CE
648
32
0
29 Nov 2023
Let's Synthesize Step by Step: Iterative Dataset Synthesis with Large
  Language Models by Extrapolating Errors from Small Models
Let's Synthesize Step by Step: Iterative Dataset Synthesis with Large Language Models by Extrapolating Errors from Small Models
Ruida Wang
Wangchunshu Zhou
Mrinmaya Sachan
307
40
0
20 Oct 2023
Sensi-BERT: Towards Sensitivity Driven Fine-Tuning for
  Parameter-Efficient BERT
Sensi-BERT: Towards Sensitivity Driven Fine-Tuning for Parameter-Efficient BERT
Souvik Kundu
S. Nittur
Maciej Szankin
Sairam Sundaresan
MQ
258
2
0
14 Jul 2023
Low-Rank Prune-And-Factorize for Language Model Compression
Low-Rank Prune-And-Factorize for Language Model CompressionInternational Conference on Language Resources and Evaluation (LREC), 2023
Siyu Ren
Kenny Q. Zhu
326
18
0
25 Jun 2023
LoSparse: Structured Compression of Large Language Models based on
  Low-Rank and Sparse Approximation
LoSparse: Structured Compression of Large Language Models based on Low-Rank and Sparse ApproximationInternational Conference on Machine Learning (ICML), 2023
Yixiao Li
Yifan Yu
Qingru Zhang
Chen Liang
Pengcheng He
Weizhu Chen
Tuo Zhao
548
121
0
20 Jun 2023
Coaching a Teachable Student
Coaching a Teachable StudentComputer Vision and Pattern Recognition (CVPR), 2023
Jimuyang Zhang
Zanming Huang
Eshed Ohn-Bar
391
34
0
16 Jun 2023
Are Intermediate Layers and Labels Really Necessary? A General Language
  Model Distillation Method
Are Intermediate Layers and Labels Really Necessary? A General Language Model Distillation MethodAnnual Meeting of the Association for Computational Linguistics (ACL), 2023
Shicheng Tan
Weng Lam Tam
Yuanchun Wang
Wenwen Gong
Shuo Zhao
Peng Zhang
Jie Tang
VLM
184
1
0
11 Jun 2023
SmartTrim: Adaptive Tokens and Attention Pruning for Efficient
  Vision-Language Models
SmartTrim: Adaptive Tokens and Attention Pruning for Efficient Vision-Language ModelsInternational Conference on Language Resources and Evaluation (LREC), 2023
Zekun Wang
Jingchang Chen
Wangchunshu Zhou
Haichao Zhu
Jiafeng Liang
Liping Shan
Ming Liu
Dongliang Xu
Qing Yang
Bing Qin
VLM
329
9
0
24 May 2023
F-PABEE: Flexible-patience-based Early Exiting for Single-label and
  Multi-label text Classification Tasks
F-PABEE: Flexible-patience-based Early Exiting for Single-label and Multi-label text Classification TasksIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023
Xiangxiang Gao
Wei-wei Zhu
Jiasheng Gao
Congrui Yin
VLM
431
23
0
21 May 2023
HomoDistil: Homotopic Task-Agnostic Distillation of Pre-trained
  Transformers
HomoDistil: Homotopic Task-Agnostic Distillation of Pre-trained TransformersInternational Conference on Learning Representations (ICLR), 2023
Chen Liang
Haoming Jiang
Zheng Li
Xianfeng Tang
Bin Yin
Tuo Zhao
VLM
321
30
0
19 Feb 2023
ZipLM: Inference-Aware Structured Pruning of Language Models
ZipLM: Inference-Aware Structured Pruning of Language ModelsNeural Information Processing Systems (NeurIPS), 2023
Eldar Kurtic
Elias Frantar
Dan Alistarh
MQ
455
52
0
07 Feb 2023
In-context Learning Distillation: Transferring Few-shot Learning Ability
  of Pre-trained Language Models
In-context Learning Distillation: Transferring Few-shot Learning Ability of Pre-trained Language Models
Yukun Huang
Yanda Chen
Zhou Yu
Kathleen McKeown
368
40
0
20 Dec 2022
Structured Knowledge Distillation Towards Efficient and Compact
  Multi-View 3D Detection
Structured Knowledge Distillation Towards Efficient and Compact Multi-View 3D Detection
Linfeng Zhang
Yukang Shi
Hung-Shuo Tai
Zhipeng Zhang
Yuan He
Ke Wang
Kaisheng Ma
325
4
0
14 Nov 2022
EfficientVLM: Fast and Accurate Vision-Language Models via Knowledge
  Distillation and Modal-adaptive Pruning
EfficientVLM: Fast and Accurate Vision-Language Models via Knowledge Distillation and Modal-adaptive PruningAnnual Meeting of the Association for Computational Linguistics (ACL), 2022
Tiannan Wang
Wangchunshu Zhou
Yan Zeng
Xinsong Zhang
VLM
249
70
0
14 Oct 2022
Less is More: Task-aware Layer-wise Distillation for Language Model
  Compression
Less is More: Task-aware Layer-wise Distillation for Language Model Compression
Chen Liang
Simiao Zuo
Qingru Zhang
Pengcheng He
Weizhu Chen
Tuo Zhao
VLM
498
116
0
04 Oct 2022
S4: a High-sparsity, High-performance AI Accelerator
S4: a High-sparsity, High-performance AI Accelerator
Ian En-Hsu Yen
Zhibin Xiao
Dongkuan Xu
212
5
0
16 Jul 2022
Recall Distortion in Neural Network Pruning and the Undecayed Pruning
  Algorithm
Recall Distortion in Neural Network Pruning and the Undecayed Pruning AlgorithmNeural Information Processing Systems (NeurIPS), 2022
Aidan Good
Jia-Huei Lin
Hannah Sieg
Mikey Ferguson
Xin Yu
Shandian Zhe
J. Wieczorek
Thiago Serra
394
12
0
07 Jun 2022
Cross-View Language Modeling: Towards Unified Cross-Lingual Cross-Modal
  Pre-training
Cross-View Language Modeling: Towards Unified Cross-Lingual Cross-Modal Pre-trainingAnnual Meeting of the Association for Computational Linguistics (ACL), 2022
Yan Zeng
Wangchunshu Zhou
Ao Luo
Ziming Cheng
Xinsong Zhang
VLM
341
38
0
01 Jun 2022
VLUE: A Multi-Task Benchmark for Evaluating Vision-Language Models
VLUE: A Multi-Task Benchmark for Evaluating Vision-Language Models
Wangchunshu Zhou
Yan Zeng
Shizhe Diao
Xinsong Zhang
CoGeVLM
354
14
0
30 May 2022
Parameter-Efficient and Student-Friendly Knowledge Distillation
Parameter-Efficient and Student-Friendly Knowledge DistillationIEEE transactions on multimedia (IEEE TMM), 2022
Jun Rao
Xv Meng
Liang Ding
Shuhan Qi
Dacheng Tao
315
77
0
28 May 2022
Sparse Mixers: Combining MoE and Mixing to build a more efficient BERT
Sparse Mixers: Combining MoE and Mixing to build a more efficient BERTConference on Empirical Methods in Natural Language Processing (EMNLP), 2022
James Lee-Thorp
Joshua Ainslie
MoE
283
14
0
24 May 2022
PointDistiller: Structured Knowledge Distillation Towards Efficient and
  Compact 3D Detection
PointDistiller: Structured Knowledge Distillation Towards Efficient and Compact 3D DetectionComputer Vision and Pattern Recognition (CVPR), 2022
Linfeng Zhang
Runpei Dong
Hung-Shuo Tai
Kaisheng Ma
3DPC
352
70
0
23 May 2022
Exploring Extreme Parameter Compression for Pre-trained Language Models
Exploring Extreme Parameter Compression for Pre-trained Language ModelsInternational Conference on Learning Representations (ICLR), 2022
Yuxin Ren
Benyou Wang
Lifeng Shang
Xin Jiang
Qun Liu
270
23
0
20 May 2022
Multimodal Adaptive Distillation for Leveraging Unimodal Encoders for
  Vision-Language Tasks
Multimodal Adaptive Distillation for Leveraging Unimodal Encoders for Vision-Language Tasks
Zhecan Wang
Noel Codella
Yen-Chun Chen
Luowei Zhou
Xiyang Dai
...
Jianwei Yang
Haoxuan You
Kai-Wei Chang
Shih-Fu Chang
Lu Yuan
VLMOffRL
320
27
0
22 Apr 2022
MoEBERT: from BERT to Mixture-of-Experts via Importance-Guided
  Adaptation
MoEBERT: from BERT to Mixture-of-Experts via Importance-Guided AdaptationNorth American Chapter of the Association for Computational Linguistics (NAACL), 2022
Simiao Zuo
Qingru Zhang
Chen Liang
Pengcheng He
T. Zhao
Weizhu Chen
MoE
454
57
0
15 Apr 2022
Unified Visual Transformer Compression
Unified Visual Transformer CompressionInternational Conference on Learning Representations (ICLR), 2022
Shixing Yu
Tianlong Chen
Jiayi Shen
Huan Yuan
Jianchao Tan
Sen Yang
Ji Liu
Zinan Lin
ViT
262
117
0
15 Mar 2022
Wavelet Knowledge Distillation: Towards Efficient Image-to-Image
  Translation
Wavelet Knowledge Distillation: Towards Efficient Image-to-Image TranslationComputer Vision and Pattern Recognition (CVPR), 2022
Linfeng Zhang
Xin Chen
Xiaobing Tu
Pengfei Wan
N. Xu
Kaisheng Ma
328
86
0
12 Mar 2022
Representation Compensation Networks for Continual Semantic Segmentation
Representation Compensation Networks for Continual Semantic SegmentationComputer Vision and Pattern Recognition (CVPR), 2022
Chang-Bin Zhang
Jianqiang Xiao
Xialei Liu
Ying-Cong Chen
Mingg-Ming Cheng
SSegCLL
268
133
0
10 Mar 2022
A Simple Hash-Based Early Exiting Approach For Language Understanding
  and Generation
A Simple Hash-Based Early Exiting Approach For Language Understanding and GenerationFindings (Findings), 2022
Tianxiang Sun
Xiangyang Liu
Wei-wei Zhu
Zhichao Geng
Lingling Wu
Yilong He
Yuan Ni
Guotong Xie
Xuanjing Huang
Xipeng Qiu
300
43
0
03 Mar 2022
TrimBERT: Tailoring BERT for Trade-offs
TrimBERT: Tailoring BERT for Trade-offs
S. N. Sridhar
Anthony Sarah
Sairam Sundaresan
MQ
201
4
0
24 Feb 2022
EdgeFormer: A Parameter-Efficient Transformer for On-Device Seq2seq
  Generation
EdgeFormer: A Parameter-Efficient Transformer for On-Device Seq2seq GenerationConference on Empirical Methods in Natural Language Processing (EMNLP), 2022
Tao Ge
Si-Qing Chen
Furu Wei
MoE
355
31
0
16 Feb 2022
A Survey on Model Compression and Acceleration for Pretrained Language
  Models
A Survey on Model Compression and Acceleration for Pretrained Language ModelsAAAI Conference on Artificial Intelligence (AAAI), 2022
Canwen Xu
Julian McAuley
398
94
0
15 Feb 2022
123
Next
Page 1 of 3