Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2002.02925
Cited By
v1
v2
v3
v4 (latest)
BERT-of-Theseus: Compressing BERT by Progressive Module Replacing
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2020
7 February 2020
Canwen Xu
Wangchunshu Zhou
Tao Ge
Furu Wei
Ming Zhou
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"BERT-of-Theseus: Compressing BERT by Progressive Module Replacing"
50 / 102 papers shown
Network of Theseus (like the ship)
Vighnesh Subramaniam
C. Conwell
Boris Katz
Andrei Barbu
Brian Cheung
152
0
0
03 Dec 2025
Deterministic Continuous Replacement: Fast and Stable Module Replacement in Pretrained Transformers
Rowan Bradbury
Aniket Srinivasan Ashok
Sai Ram Kasanagottu
Gunmay Jhingran
Shuai Meng
186
0
0
24 Nov 2025
A Metamorphic Testing Perspective on Knowledge Distillation for Language Models of Code: Does the Student Deeply Mimic the Teacher?
Md. Abdul Awal
Mrigank Rochan
Chanchal K. Roy
248
1
0
07 Nov 2025
Improving LLM Reasoning via Dependency-Aware Query Decomposition and Logic-Parallel Content Expansion
Xianjun Gao
Jianchun Liu
Hongli Xu
Liusheng Huang
LRM
130
0
0
28 Oct 2025
SQS: Bayesian DNN Compression through Sparse Quantized Sub-distributions
Ziyi Wang
Nan Jiang
Guang Lin
Qifan Song
MQ
254
0
0
10 Oct 2025
CoSpaDi: Compressing LLMs via Calibration-Guided Sparse Dictionary Learning
Dmitriy Shopkhoev
Denis Makhov
Magauiya Zhussip
Ammar Ali
Stamatios Lefkimmiatis
246
3
0
26 Sep 2025
When Long Helps Short: How Context Length in Supervised Fine-tuning Affects Behavior of Large Language Models
Yingming Zheng
Hanqi Li
Kai Yu
Lu Chen
320
0
0
23 Sep 2025
An Empirical Study of Knowledge Distillation for Code Understanding Tasks
Ruiqi Wang
Zezhou Yang
Cuiyun Gao
Xin Xia
Qing Liao
183
2
0
21 Aug 2025
Computational Economics in Large Language Models: Exploring Model Behavior and Incentive Design under Resource Constraints
Sandeep Reddy
Kabir Khan
Rohit Patil
Ananya Chakraborty
Faizan A. Khan
Swati Kulkarni
Arjun Verma
Neha Singh
240
1
0
14 Aug 2025
General Compression Framework for Efficient Transformer Object Tracking
Lingyi Hong
Jinglun Li
Xinyu Zhou
Shilin Yan
Pinxue Guo
...
Runze Li
Xingdong Sheng
Wei Zhang
Hong Lu
Wenqiang Zhang
ViT
371
4
0
01 Jul 2025
Towards a Small Language Model Lifecycle Framework
Parsa Miraghaei
Sergio Moreschini
Antti Kolehmainen
David Hästbacka
208
0
0
09 Jun 2025
EPEE: Towards Efficient and Effective Foundation Models in Biomedicine
Zaifu Zhan
Shuang Zhou
Huixue Zhou
Ziqiang Liu
Rui Zhang
275
1
0
03 Mar 2025
Data-adaptive Differentially Private Prompt Synthesis for In-Context Learning
International Conference on Learning Representations (ICLR), 2024
Fengyu Gao
Ruida Zhou
T. Wang
Cong Shen
Jing Yang
377
8
0
15 Oct 2024
m2mKD: Module-to-Module Knowledge Distillation for Modular Transformers
Ka Man Lo
Yiming Liang
Wenyu Du
Yuantao Fan
Zili Wang
Wenhao Huang
Lei Ma
Jie Fu
MoE
383
4
0
26 Feb 2024
Everybody Prune Now: Structured Pruning of LLMs with only Forward Passes
Lucio Dery
Steven Kolawole
Jean-Francois Kagey
Virginia Smith
Graham Neubig
Ameet Talwalkar
402
50
0
08 Feb 2024
A Survey on Transformer Compression
Yehui Tang
Yunhe Wang
Jianyuan Guo
Zhijun Tu
Kai Han
Hailin Hu
Dacheng Tao
584
73
0
05 Feb 2024
DE
3
^3
3
-BERT: Distance-Enhanced Early Exiting for BERT based on Prototypical Networks
Jianing He
Tao Gui
Weiping Ding
Duoqian Miao
Jun Zhao
Liang Hu
LongBing Cao
256
6
0
03 Feb 2024
BPDec: Unveiling the Potential of Masked Language Modeling Decoder in BERT pretraining
International Conference on Neural Information Processing (ICONIP), 2024
Wen-Chieh Liang
Youzhi Liang
OffRL
176
2
0
29 Jan 2024
Grounding Foundation Models through Federated Transfer Learning: A General Framework
ACM Transactions on Intelligent Systems and Technology (ACM TIST), 2023
Weijing Chen
Tao Fan
Hanlin Gu
Xiaojin Zhang
Lixin Fan
Qiang Yang
AI4CE
648
32
0
29 Nov 2023
Let's Synthesize Step by Step: Iterative Dataset Synthesis with Large Language Models by Extrapolating Errors from Small Models
Ruida Wang
Wangchunshu Zhou
Mrinmaya Sachan
307
40
0
20 Oct 2023
Sensi-BERT: Towards Sensitivity Driven Fine-Tuning for Parameter-Efficient BERT
Souvik Kundu
S. Nittur
Maciej Szankin
Sairam Sundaresan
MQ
258
2
0
14 Jul 2023
Low-Rank Prune-And-Factorize for Language Model Compression
International Conference on Language Resources and Evaluation (LREC), 2023
Siyu Ren
Kenny Q. Zhu
326
18
0
25 Jun 2023
LoSparse: Structured Compression of Large Language Models based on Low-Rank and Sparse Approximation
International Conference on Machine Learning (ICML), 2023
Yixiao Li
Yifan Yu
Qingru Zhang
Chen Liang
Pengcheng He
Weizhu Chen
Tuo Zhao
548
121
0
20 Jun 2023
Coaching a Teachable Student
Computer Vision and Pattern Recognition (CVPR), 2023
Jimuyang Zhang
Zanming Huang
Eshed Ohn-Bar
391
34
0
16 Jun 2023
Are Intermediate Layers and Labels Really Necessary? A General Language Model Distillation Method
Annual Meeting of the Association for Computational Linguistics (ACL), 2023
Shicheng Tan
Weng Lam Tam
Yuanchun Wang
Wenwen Gong
Shuo Zhao
Peng Zhang
Jie Tang
VLM
184
1
0
11 Jun 2023
SmartTrim: Adaptive Tokens and Attention Pruning for Efficient Vision-Language Models
International Conference on Language Resources and Evaluation (LREC), 2023
Zekun Wang
Jingchang Chen
Wangchunshu Zhou
Haichao Zhu
Jiafeng Liang
Liping Shan
Ming Liu
Dongliang Xu
Qing Yang
Bing Qin
VLM
329
9
0
24 May 2023
F-PABEE: Flexible-patience-based Early Exiting for Single-label and Multi-label text Classification Tasks
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023
Xiangxiang Gao
Wei-wei Zhu
Jiasheng Gao
Congrui Yin
VLM
431
23
0
21 May 2023
HomoDistil: Homotopic Task-Agnostic Distillation of Pre-trained Transformers
International Conference on Learning Representations (ICLR), 2023
Chen Liang
Haoming Jiang
Zheng Li
Xianfeng Tang
Bin Yin
Tuo Zhao
VLM
321
30
0
19 Feb 2023
ZipLM: Inference-Aware Structured Pruning of Language Models
Neural Information Processing Systems (NeurIPS), 2023
Eldar Kurtic
Elias Frantar
Dan Alistarh
MQ
455
52
0
07 Feb 2023
In-context Learning Distillation: Transferring Few-shot Learning Ability of Pre-trained Language Models
Yukun Huang
Yanda Chen
Zhou Yu
Kathleen McKeown
368
40
0
20 Dec 2022
Structured Knowledge Distillation Towards Efficient and Compact Multi-View 3D Detection
Linfeng Zhang
Yukang Shi
Hung-Shuo Tai
Zhipeng Zhang
Yuan He
Ke Wang
Kaisheng Ma
325
4
0
14 Nov 2022
EfficientVLM: Fast and Accurate Vision-Language Models via Knowledge Distillation and Modal-adaptive Pruning
Annual Meeting of the Association for Computational Linguistics (ACL), 2022
Tiannan Wang
Wangchunshu Zhou
Yan Zeng
Xinsong Zhang
VLM
249
70
0
14 Oct 2022
Less is More: Task-aware Layer-wise Distillation for Language Model Compression
Chen Liang
Simiao Zuo
Qingru Zhang
Pengcheng He
Weizhu Chen
Tuo Zhao
VLM
498
116
0
04 Oct 2022
S4: a High-sparsity, High-performance AI Accelerator
Ian En-Hsu Yen
Zhibin Xiao
Dongkuan Xu
212
5
0
16 Jul 2022
Recall Distortion in Neural Network Pruning and the Undecayed Pruning Algorithm
Neural Information Processing Systems (NeurIPS), 2022
Aidan Good
Jia-Huei Lin
Hannah Sieg
Mikey Ferguson
Xin Yu
Shandian Zhe
J. Wieczorek
Thiago Serra
394
12
0
07 Jun 2022
Cross-View Language Modeling: Towards Unified Cross-Lingual Cross-Modal Pre-training
Annual Meeting of the Association for Computational Linguistics (ACL), 2022
Yan Zeng
Wangchunshu Zhou
Ao Luo
Ziming Cheng
Xinsong Zhang
VLM
341
38
0
01 Jun 2022
VLUE: A Multi-Task Benchmark for Evaluating Vision-Language Models
Wangchunshu Zhou
Yan Zeng
Shizhe Diao
Xinsong Zhang
CoGe
VLM
354
14
0
30 May 2022
Parameter-Efficient and Student-Friendly Knowledge Distillation
IEEE transactions on multimedia (IEEE TMM), 2022
Jun Rao
Xv Meng
Liang Ding
Shuhan Qi
Dacheng Tao
315
77
0
28 May 2022
Sparse Mixers: Combining MoE and Mixing to build a more efficient BERT
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2022
James Lee-Thorp
Joshua Ainslie
MoE
283
14
0
24 May 2022
PointDistiller: Structured Knowledge Distillation Towards Efficient and Compact 3D Detection
Computer Vision and Pattern Recognition (CVPR), 2022
Linfeng Zhang
Runpei Dong
Hung-Shuo Tai
Kaisheng Ma
3DPC
352
70
0
23 May 2022
Exploring Extreme Parameter Compression for Pre-trained Language Models
International Conference on Learning Representations (ICLR), 2022
Yuxin Ren
Benyou Wang
Lifeng Shang
Xin Jiang
Qun Liu
270
23
0
20 May 2022
Multimodal Adaptive Distillation for Leveraging Unimodal Encoders for Vision-Language Tasks
Zhecan Wang
Noel Codella
Yen-Chun Chen
Luowei Zhou
Xiyang Dai
...
Jianwei Yang
Haoxuan You
Kai-Wei Chang
Shih-Fu Chang
Lu Yuan
VLM
OffRL
320
27
0
22 Apr 2022
MoEBERT: from BERT to Mixture-of-Experts via Importance-Guided Adaptation
North American Chapter of the Association for Computational Linguistics (NAACL), 2022
Simiao Zuo
Qingru Zhang
Chen Liang
Pengcheng He
T. Zhao
Weizhu Chen
MoE
454
57
0
15 Apr 2022
Unified Visual Transformer Compression
International Conference on Learning Representations (ICLR), 2022
Shixing Yu
Tianlong Chen
Jiayi Shen
Huan Yuan
Jianchao Tan
Sen Yang
Ji Liu
Zinan Lin
ViT
262
117
0
15 Mar 2022
Wavelet Knowledge Distillation: Towards Efficient Image-to-Image Translation
Computer Vision and Pattern Recognition (CVPR), 2022
Linfeng Zhang
Xin Chen
Xiaobing Tu
Pengfei Wan
N. Xu
Kaisheng Ma
328
86
0
12 Mar 2022
Representation Compensation Networks for Continual Semantic Segmentation
Computer Vision and Pattern Recognition (CVPR), 2022
Chang-Bin Zhang
Jianqiang Xiao
Xialei Liu
Ying-Cong Chen
Mingg-Ming Cheng
SSeg
CLL
268
133
0
10 Mar 2022
A Simple Hash-Based Early Exiting Approach For Language Understanding and Generation
Findings (Findings), 2022
Tianxiang Sun
Xiangyang Liu
Wei-wei Zhu
Zhichao Geng
Lingling Wu
Yilong He
Yuan Ni
Guotong Xie
Xuanjing Huang
Xipeng Qiu
300
43
0
03 Mar 2022
TrimBERT: Tailoring BERT for Trade-offs
S. N. Sridhar
Anthony Sarah
Sairam Sundaresan
MQ
201
4
0
24 Feb 2022
EdgeFormer: A Parameter-Efficient Transformer for On-Device Seq2seq Generation
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2022
Tao Ge
Si-Qing Chen
Furu Wei
MoE
355
31
0
16 Feb 2022
A Survey on Model Compression and Acceleration for Pretrained Language Models
AAAI Conference on Artificial Intelligence (AAAI), 2022
Canwen Xu
Julian McAuley
398
94
0
15 Feb 2022
1
2
3
Next
Page 1 of 3