v1v2v3v4 (latest)

BERT-of-Theseus: Compressing BERT by Progressive Module Replacing

Conference on Empirical Methods in Natural Language Processing (EMNLP), 2020

7 February 2020

Papers citing "BERT-of-Theseus: Compressing BERT by Progressive Module Replacing"

50 / 102 papers shown

Network of Theseus (like the ship)

152

03 Dec 2025

Deterministic Continuous Replacement: Fast and Stable Module Replacement in Pretrained Transformers

Rowan Bradbury

Aniket Srinivasan Ashok

Sai Ram Kasanagottu

Gunmay Jhingran

Shuai Meng

186

24 Nov 2025

A Metamorphic Testing Perspective on Knowledge Distillation for Language Models of Code: Does the Student Deeply Mimic the Teacher?

Md. Abdul Awal

Mrigank Rochan

Chanchal K. Roy

248

07 Nov 2025

Improving LLM Reasoning via Dependency-Aware Query Decomposition and Logic-Parallel Content Expansion

130

28 Oct 2025

SQS: Bayesian DNN Compression through Sparse Quantized Sub-distributions

254

10 Oct 2025

CoSpaDi: Compressing LLMs via Calibration-Guided Sparse Dictionary Learning

Stamatios Lefkimmiatis

246

26 Sep 2025

When Long Helps Short: How Context Length in Supervised Fine-tuning Affects Behavior of Large Language Models

320

23 Sep 2025

An Empirical Study of Knowledge Distillation for Code Understanding Tasks

183

21 Aug 2025

Computational Economics in Large Language Models: Exploring Model Behavior and Incentive Design under Resource Constraints

240

14 Aug 2025

General Compression Framework for Efficient Transformer Object Tracking

...

371

01 Jul 2025

Towards a Small Language Model Lifecycle Framework

208

09 Jun 2025

EPEE: Towards Efficient and Effective Foundation Models in Biomedicine

275

03 Mar 2025

Data-adaptive Differentially Private Prompt Synthesis for In-Context LearningInternational Conference on Learning Representations (ICLR), 2024

Cong Shen

377

15 Oct 2024

m2mKD: Module-to-Module Knowledge Distillation for Modular Transformers

383

26 Feb 2024

Everybody Prune Now: Structured Pruning of LLMs with only Forward Passes

Graham Neubig

402

08 Feb 2024

A Survey on Transformer Compression

584

05 Feb 2024

^3

-BERT: Distance-Enhanced Early Exiting for BERT based on Prototypical Networks

Duoqian Miao

256

03 Feb 2024

BPDec: Unveiling the Potential of Masked Language Modeling Decoder in BERT pretrainingInternational Conference on Neural Information Processing (ICONIP), 2024

Wen-Chieh Liang

Youzhi Liang

OffRL

176

29 Jan 2024

Grounding Foundation Models through Federated Transfer Learning: A General FrameworkACM Transactions on Intelligent Systems and Technology (ACM TIST), 2023

Hanlin Gu

648

29 Nov 2023

Let's Synthesize Step by Step: Iterative Dataset Synthesis with Large Language Models by Extrapolating Errors from Small Models

Ruida Wang

Wangchunshu Zhou

Mrinmaya Sachan

307

20 Oct 2023

Sensi-BERT: Towards Sensitivity Driven Fine-Tuning for Parameter-Efficient BERT

258

14 Jul 2023

Low-Rank Prune-And-Factorize for Language Model CompressionInternational Conference on Language Resources and Evaluation (LREC), 2023

Siyu Ren

Kenny Q. Zhu

326

25 Jun 2023

LoSparse: Structured Compression of Large Language Models based on Low-Rank and Sparse ApproximationInternational Conference on Machine Learning (ICML), 2023

548

121

20 Jun 2023

Coaching a Teachable StudentComputer Vision and Pattern Recognition (CVPR), 2023

Jimuyang Zhang

Zanming Huang

Eshed Ohn-Bar

391

16 Jun 2023

Are Intermediate Layers and Labels Really Necessary? A General Language Model Distillation MethodAnnual Meeting of the Association for Computational Linguistics (ACL), 2023

184

11 Jun 2023

SmartTrim: Adaptive Tokens and Attention Pruning for Efficient Vision-Language ModelsInternational Conference on Language Resources and Evaluation (LREC), 2023

329

24 May 2023

F-PABEE: Flexible-patience-based Early Exiting for Single-label and Multi-label text Classification TasksIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023

431

21 May 2023

HomoDistil: Homotopic Task-Agnostic Distillation of Pre-trained TransformersInternational Conference on Learning Representations (ICLR), 2023

321

19 Feb 2023

ZipLM: Inference-Aware Structured Pruning of Language ModelsNeural Information Processing Systems (NeurIPS), 2023

Eldar Kurtic

Elias Frantar

Dan Alistarh

455

07 Feb 2023

In-context Learning Distillation: Transferring Few-shot Learning Ability of Pre-trained Language Models

Yukun Huang

Yanda Chen

Zhou Yu

Kathleen McKeown

368

20 Dec 2022

Structured Knowledge Distillation Towards Efficient and Compact Multi-View 3D Detection

325

14 Nov 2022

EfficientVLM: Fast and Accurate Vision-Language Models via Knowledge Distillation and Modal-adaptive PruningAnnual Meeting of the Association for Computational Linguistics (ACL), 2022

249

14 Oct 2022

Less is More: Task-aware Layer-wise Distillation for Language Model Compression

498

116

04 Oct 2022

S4: a High-sparsity, High-performance AI Accelerator

Ian En-Hsu Yen

Zhibin Xiao

Dongkuan Xu

212

16 Jul 2022

Recall Distortion in Neural Network Pruning and the Undecayed Pruning AlgorithmNeural Information Processing Systems (NeurIPS), 2022

394

07 Jun 2022

Cross-View Language Modeling: Towards Unified Cross-Lingual Cross-Modal Pre-trainingAnnual Meeting of the Association for Computational Linguistics (ACL), 2022

341

01 Jun 2022

VLUE: A Multi-Task Benchmark for Evaluating Vision-Language Models

354

30 May 2022

Parameter-Efficient and Student-Friendly Knowledge DistillationIEEE transactions on multimedia (IEEE TMM), 2022

Liang Ding

315

28 May 2022

Sparse Mixers: Combining MoE and Mixing to build a more efficient BERTConference on Empirical Methods in Natural Language Processing (EMNLP), 2022

James Lee-Thorp

Joshua Ainslie

MoE

283

24 May 2022

PointDistiller: Structured Knowledge Distillation Towards Efficient and Compact 3D DetectionComputer Vision and Pattern Recognition (CVPR), 2022

352

23 May 2022

Exploring Extreme Parameter Compression for Pre-trained Language ModelsInternational Conference on Learning Representations (ICLR), 2022

Yuxin Ren

Benyou Wang

Lifeng Shang

Xin Jiang

Qun Liu

270

20 May 2022

Multimodal Adaptive Distillation for Leveraging Unimodal Encoders for Vision-Language Tasks

...

Jianwei Yang

Lu Yuan

320

22 Apr 2022

MoEBERT: from BERT to Mixture-of-Experts via Importance-Guided AdaptationNorth American Chapter of the Association for Computational Linguistics (NAACL), 2022

454

15 Apr 2022

Unified Visual Transformer CompressionInternational Conference on Learning Representations (ICLR), 2022

262

117

15 Mar 2022

Wavelet Knowledge Distillation: Towards Efficient Image-to-Image TranslationComputer Vision and Pattern Recognition (CVPR), 2022

328

12 Mar 2022

Representation Compensation Networks for Continual Semantic SegmentationComputer Vision and Pattern Recognition (CVPR), 2022

268

133

10 Mar 2022

A Simple Hash-Based Early Exiting Approach For Language Understanding and GenerationFindings (Findings), 2022

Tianxiang Sun

Xiangyang Liu

Xuanjing Huang

Xipeng Qiu

300

03 Mar 2022

TrimBERT: Tailoring BERT for Trade-offs

S. N. Sridhar

Anthony Sarah

Sairam Sundaresan

201

24 Feb 2022

EdgeFormer: A Parameter-Efficient Transformer for On-Device Seq2seq GenerationConference on Empirical Methods in Natural Language Processing (EMNLP), 2022

355

16 Feb 2022

A Survey on Model Compression and Acceleration for Pretrained Language ModelsAAAI Conference on Artificial Intelligence (AAAI), 2022

Canwen Xu

Julian McAuley

398

15 Feb 2022