v1v2v3v4v5 (latest)

TinyBERT: Distilling BERT for Natural Language Understanding

Findings (Findings), 2019

23 September 2019

Xiaoqi Jiao

Yichun Yin

Lifeng Shang

Xin Jiang

Linlin Li

Qun Liu

Papers citing "TinyBERT: Distilling BERT for Natural Language Understanding"

50 / 1,056 papers shown

Learned Token Pruning for Transformers

Sehoon Kim

356

194

02 Jul 2021

Knowledge Distillation for Quality Estimation

Amit Gajbhiye

M. Fomicheva

Fernando Alva-Manchego

247

01 Jul 2021

Elbert: Fast Albert with Confidence-Window Based Early ExitIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2021

184

01 Jul 2021

On the Interaction of Belief Bias and ExplanationsFindings (Findings), 2021

225

29 Jun 2021

Adapt-and-Distill: Developing Small, Fast and Effective Pretrained Language Models for Domains

232

25 Jun 2021

Data Augmentation for Opcode Sequence Based Malware Detection

Niall McLaughlin

Jesus Martinez del Rincon

128

22 Jun 2021

LV-BERT: Exploiting Layer Variety for BERTFindings (Findings), 2021

Weihao Yu

156

22 Jun 2021

Categorising Fine-to-Coarse Grained Misinformation: An Empirical Study of COVID-19 InfodemicRecent Advances in Natural Language Processing (RANLP), 2021

322

22 Jun 2021

Direction is what you need: Improving Word Embedding Compression in Large Language Models

120

15 Jun 2021

Pre-Trained Models: Past, Present and FutureAI Open (AO), 2021

Xu Han

Zhengyan Zhang

Ning Ding

Yuxian Gu

Xiao Liu

...

Jun Zhu

392

998

14 Jun 2021

Why Can You Lay Off Heads? Investigating How BERT Heads Transfer

Ting-Rui Chiang

Yun-Nung Chen

14 Jun 2021

HR-NAS: Searching Efficient High-Resolution Neural Architectures with Lightweight TransformersComputer Vision and Pattern Recognition (CVPR), 2021

Mingyu Ding

Xiaochen Lian

Linjie Yang

Peng Wang

Xiaojie Jin

Zhiwu Lu

Ping Luo

ViT

242

11 Jun 2021

Generate, Annotate, and Learn: NLP with Synthetic TextTransactions of the Association for Computational Linguistics (TACL), 2021

326

11 Jun 2021

RefBERT: Compressing BERT by Referencing to Pre-computed RepresentationsIEEE International Joint Conference on Neural Network (IJCNN), 2021

168

11 Jun 2021

Marginal Utility Diminishes: Exploring the Minimum Knowledge for BERT Knowledge DistillationAnnual Meeting of the Association for Computational Linguistics (ACL), 2021

Yuanxin Liu

Fandong Meng

Zheng Lin

Weiping Wang

Jie Zhou

10 Jun 2021

AUGNLG: Few-shot Natural Language Generation using Self-trained Data AugmentationAnnual Meeting of the Association for Computational Linguistics (ACL), 2021

156

10 Jun 2021

BERT Learns to Teach: Knowledge Distillation with Meta LearningAnnual Meeting of the Association for Computational Linguistics (ACL), 2021

Wangchunshu Zhou

Canwen Xu

Julian McAuley

321

107

08 Jun 2021

XtremeDistilTransformers: Task Transfer for Task-agnostic Distillation

Subhabrata Mukherjee

Ahmed Hassan Awadallah

Jianfeng Gao

241

08 Jun 2021

Multi-hop Graph Convolutional Network with High-order Chebyshev Approximation for Text ReasoningAnnual Meeting of the Association for Computational Linguistics (ACL), 2021

Shuoran Jiang

Qingcai Chen

Xin Liu

Baotian Hu

Lisai Zhang

124

08 Jun 2021

RoSearch: Search for Robust Student Architectures When Distilling Pre-trained Language Models

Xin Guo

114

07 Jun 2021

You Only Compress Once: Towards Effective and Elastic BERT Compression via Exploit-Explore Stochastic Nature Gradient

Mengdi Wang

Shen Li

Jun Yang

Rongrong Ji

178

04 Jun 2021

ERNIE-Tiny : A Progressive Distillation Framework for Pretrained Transformer Compression

194

04 Jun 2021

Enabling Lightweight Fine-tuning for Pre-trained Language Model Compression based on Matrix Product OperatorsAnnual Meeting of the Association for Computational Linguistics (ACL), 2021

108

04 Jun 2021

DynamicViT: Efficient Vision Transformers with Dynamic Token SparsificationNeural Information Processing Systems (NeurIPS), 2021

Wenliang Zhao

Jie Zhou

529

932

03 Jun 2021

One Teacher is Enough? Pre-trained Language Model Distillation from Multiple TeachersFindings (Findings), 2021

Chuhan Wu

Fangzhao Wu

Yongfeng Huang

182

02 Jun 2021

Towards Quantifiable Dialogue Coherence EvaluationAnnual Meeting of the Association for Computational Linguistics (ACL), 2021

Xiaodan Liang

148

01 Jun 2021

DoT: An efficient Double Transformer for NLP tasks with tablesFindings (Findings), 2021

Syrine Krichene

Thomas Müller

Julian Martin Eisenschlos

203

01 Jun 2021

Distribution Matching for RationalizationAAAI Conference on Artificial Intelligence (AAAI), 2021

172

01 Jun 2021

Connecting Language and Vision for Natural Language-Based Vehicle Retrieval

Shuai Bai

Chang Zhou

Yi Yang

Hongxia Yang

240

31 May 2021

Greedy-layer Pruning: Speeding up Transformer Models for Natural Language ProcessingPattern Recognition Letters (PR), 2021

197

31 May 2021

LEAP: Learnable Pruning for Transformer-based Models

Yuxiong He

214

30 May 2021

NAS-BERT: Task-Agnostic and Adaptive-Size BERT Compression with Neural Architecture SearchKnowledge Discovery and Data Mining (KDD), 2021

Xu Tan

152

30 May 2021

Knowledge Inheritance for Pre-trained Language ModelsNorth American Chapter of the Association for Computational Linguistics (NAACL), 2021

Yujia Qin

Yankai Lin

Jing Yi

Jiajie Zhang

Xu Han

...

Yusheng Su

Zhiyuan Liu

Peng Li

Maosong Sun

Jie Zhou

VLM

240

28 May 2021

Accelerating BERT Inference for Sequence Labeling via Early-ExitAnnual Meeting of the Association for Computational Linguistics (ACL), 2021

Xiaonan Li

Yunfan Shao

Tianxiang Sun

Hang Yan

Xipeng Qiu

Xuanjing Huang

279

28 May 2021

Lightweight Cross-Lingual Sentence Representation LearningAnnual Meeting of the Association for Computational Linguistics (ACL), 2021

Sadao Kurohashi

334

28 May 2021

Early Exiting with Ensemble Internal Classifiers

Tianxiang Sun

Xiangyang Liu

Xuanjing Huang

Xipeng Qiu

157

28 May 2021

Not Far Away, Not So Close: Sample Efficient Nearest Neighbour Data Augmentation via MiniMaxFindings (Findings), 2021

200

28 May 2021

Selective Knowledge Distillation for Neural Machine TranslationAnnual Meeting of the Association for Computational Linguistics (ACL), 2021

Fusheng Wang

Jianhao Yan

Fandong Meng

Jie Zhou

203

27 May 2021

TR-BERT: Dynamic Token Reduction for Accelerating BERT InferenceNorth American Chapter of the Association for Computational Linguistics (NAACL), 2021

Deming Ye

Yankai Lin

Yufei Huang

Maosong Sun

207

25 May 2021

Intra-Document Cascading: Learning to Select Passages for Neural Document RankingAnnual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR), 2021

183

20 May 2021

BERT Busters: Outlier Dimensions that Disrupt TransformersFindings (Findings), 2021

453

112

14 May 2021

Retrieval-Free Knowledge-Grounded Dialogue Response Generation with AdaptersWorkshop on Document-grounded Dialogue and Conversational Question Answering (DialDoc), 2021

Andrea Madotto

183

13 May 2021

MATE-KD: Masked Adversarial TExt, a Companion to Knowledge DistillationAnnual Meeting of the Association for Computational Linguistics (ACL), 2021

248

12 May 2021

FNet: Mixing Tokens with Fourier TransformsNorth American Chapter of the Association for Computational Linguistics (NAACL), 2021

James Lee-Thorp

Joshua Ainslie

Ilya Eckstein

Santiago Ontanon

658

645

09 May 2021

Easy and Efficient Transformer : Scalable Inference Solution For large NLP modelNorth American Chapter of the Association for Computational Linguistics (NAACL), 2021

Changjie Fan

Zeng Zhao

270

26 Apr 2021

Extract then Distill: Efficient and Effective Task-Agnostic BERT DistillationInternational Conference on Artificial Neural Networks (ICANN), 2021

Lifeng Shang

Xin Jiang

Qun Liu

149

24 Apr 2021

Disfluency Detection with Unlabeled Data and Small BERT ModelsInterspeech (Interspeech), 2021

175

21 Apr 2021

Review of end-to-end speech synthesis technology based on deep learning

216

20 Apr 2021

Knowledge Distillation as Semiparametric InferenceInternational Conference on Learning Representations (ICLR), 2021

233

20 Apr 2021

Rethinking Network Pruning -- under the Pre-train and Fine-tune ParadigmNorth American Chapter of the Association for Computational Linguistics (NAACL), 2021

Dongkuan Xu

193

18 Apr 2021