v1v2v3v4v5 (latest)

TinyBERT: Distilling BERT for Natural Language Understanding

Findings (Findings), 2019

23 September 2019

Xiaoqi Jiao

Yichun Yin

Lifeng Shang

Xin Jiang

Linlin Li

Qun Liu

Papers citing "TinyBERT: Distilling BERT for Natural Language Understanding"

50 / 1,055 papers shown

Learning from the Best: Rationalizing Prediction by Adversarial Information CalibrationAAAI Conference on Artificial Intelligence (AAAI), 2020

Lei Sha

Oana-Maria Camburu

Thomas Lukasiewicz

395

16 Dec 2020

A Lightweight Neural Model for Biomedical Entity LinkingAAAI Conference on Artificial Intelligence (AAAI), 2020

156

16 Dec 2020

EmpLite: A Lightweight Sequence Labeling Model for Emphasis Selection of Short TextsICON (ICON), 2020

Vibhav Agarwal

Sourav Ghosh

Kranti Chalamalasetti

B. Challa

S. Kumari

Harshavardhana

Barath Raj Kandur Raja

15 Dec 2020

Parameter-Efficient Transfer Learning with Diff PruningAnnual Meeting of the Association for Computational Linguistics (ACL), 2020

Demi Guo

Alexander M. Rush

Yoon Kim

326

469

14 Dec 2020

LRC-BERT: Latent-representation Contrastive Knowledge Distillation for Natural Language UnderstandingAAAI Conference on Artificial Intelligence (AAAI), 2020

297

14 Dec 2020

MiniVLM: A Smaller and Faster Vision-Language Model

Xiaowei Hu

Zicheng Liu

236

13 Dec 2020

Reinforced Multi-Teacher Selection for Knowledge DistillationAAAI Conference on Artificial Intelligence (AAAI), 2020

321

147

11 Dec 2020

Improving Task-Agnostic BERT Distillation with Layer Mapping SearchNeurocomputing (Neurocomputing), 2020

Xiaoqi Jiao

Huating Chang

Yichun Yin

Lifeng Shang

Xin Jiang

Xiao Chen

Linlin Li

Fang Wang

Qun Liu

128

11 Dec 2020

Meta-KD: A Meta Knowledge Distillation Framework for Language Model Compression across DomainsAnnual Meeting of the Association for Computational Linguistics (ACL), 2020

Chengyu Wang

Yichang Zhang

222

02 Dec 2020

CPM: A Large-scale Generative Chinese Pre-trained Language ModelAI Open (AO), 2020

Zhengyan Zhang

Xu Han

...

Maosong Sun

219

128

01 Dec 2020

A Selective Survey on Versatile Knowledge Distillation Paradigm for Neural Network Models

188

30 Nov 2020

Bringing AI To Edge: From Deep Learning's PerspectiveNeurocomputing (Neurocomputing), 2020

250

152

25 Nov 2020

EasyTransfer -- A Simple and Scalable Deep Transfer Learning Platform for NLP ApplicationsInternational Conference on Information and Knowledge Management (CIKM), 2020

Minghui Qiu

Peng Li

Chengyu Wang

...

Yaliang Li

362

18 Nov 2020

Know What You Don't Need: Single-Shot Meta-Pruning for Attention Heads

Zhengyan Zhang

Fanchao Qi

Zhiyuan Liu

Qun Liu

Maosong Sun

VLM

165

07 Nov 2020

Influence Patterns for Explaining Information Flow in BERTNeural Information Processing Systems (NeurIPS), 2020

Kaiji Lu

Zifan Wang

Piotr (Peter) Mardziel

Anupam Datta

GNN

242

02 Nov 2020

FastFormers: Highly Efficient Transformer Models for Natural Language Understanding

Young Jin Kim

Hany Awadalla

AI4CE

164

26 Oct 2020

Accelerating Training of Transformer-Based Language Models with Progressive Layer DroppingNeural Information Processing Systems (NeurIPS), 2020

Minjia Zhang

Yuxiong He

AI4CE

146

116

26 Oct 2020

Pre-trained Summarization Distillation

Sam Shleifer

Alexander M. Rush

194

118

24 Oct 2020

Knowledge Distillation for Improved Accuracy in Spoken Question Answering

Chenyu You

Polydoros Giannouris

Yuexian Zou

399

21 Oct 2020

Optimal Subarchitecture Extraction For BERT

Adrian de Wynter

Daniel J. Perry

228

20 Oct 2020

BERT2DNN: BERT Distillation with Massive Unlabeled Data for Online E-Commerce SearchIndustrial Conference on Data Mining (IDM), 2020

146

20 Oct 2020

HABERTOR: An Efficient and Effective Deep Hatespeech DetectorConference on Empirical Methods in Natural Language Processing (EMNLP), 2020

192

17 Oct 2020

AutoADR: Automatic Model Design for Ad Relevance

Yujing Wang

Jing Bai

146

14 Oct 2020

Weight Squeezing: Reparameterization for Knowledge Transfer and Model Compression

201

14 Oct 2020

Pretrained Transformers for Text Ranking: BERT and Beyond

796

699

13 Oct 2020

BERT-EMD: Many-to-Many Layer Mapping for BERT Compression with Earth Mover's Distance

Jianquan Li

Xiaokang Liu

Honghong Zhao

Ruifeng Xu

Min Yang

Yaohong Jin

214

13 Oct 2020

Learning Which Features Matter: RoBERTa Acquires a Preference for Linguistic Generalizations (Eventually)

Haau-Sing Li

202

11 Oct 2020

Adversarial Self-Supervised Data-Free Distillation for Text ClassificationConference on Empirical Methods in Natural Language Processing (EMNLP), 2020

253

10 Oct 2020

Style Attuned Pre-training and Parameter Efficient Fine-tuning for Spoken Language UnderstandingInterspeech (Interspeech), 2020

09 Oct 2020

Deep Learning Meets Projective ClusteringInternational Conference on Learning Representations (ICLR), 2020

Alaa Maalouf

Harry Lang

Daniela Rus

Dan Feldman

217

08 Oct 2020

On the importance of pre-training data volume for compact language models

Vincent Micheli

Martin d'Hoffschmidt

Franccois Fleuret

205

08 Oct 2020

AxFormer: Accuracy-driven Approximation of Transformers for Faster, Smaller and more Accurate NLP Models

165

07 Oct 2020

CATBERT: Context-Aware Tiny BERT for Detecting Social Engineering Emails

Younghoon Lee

Joshua Saxe

Richard E. Harang

112

07 Oct 2020

Why Skip If You Can Combine: A Simple Knowledge Distillation Technique for Intermediate Layers

Qun Liu

133

06 Oct 2020

Pruning Redundant Mappings in Transformer Models via Spectral-Normalized Identity PriorFindings (Findings), 2020

200

05 Oct 2020

Pea-KD: Parameter-efficient and Accurate Knowledge Distillation on BERT

Ikhyun Cho

U. Kang

119

30 Sep 2020

Contrastive Distillation on Intermediate Representations for Language Model Compression

227

29 Sep 2020

TernaryBERT: Distillation-aware Ultra-low Bit BERTConference on Empirical Methods in Natural Language Processing (EMNLP), 2020

Lifeng Shang

Xin Jiang

Qun Liu

287

225

27 Sep 2020

A little goes a long way: Improving toxic language classification despite data scarcityFindings (Findings), 2020

243

25 Sep 2020

RecoBERT: A Catalog Language Model for Text-Based RecommendationsFindings (Findings), 2020

266

25 Sep 2020

Hierarchical Pre-training for Sequence Labelling in Spoken DialogFindings (Findings), 2020

446

23 Sep 2020

Weight Distillation: Transferring the Knowledge in Neural Network ParametersAnnual Meeting of the Association for Computational Linguistics (ACL), 2020

Jingbo Zhu

278

19 Sep 2020

Efficient Transformer-based Large Scale Language Representations using Hardware-friendly Block Structured PruningFindings (Findings), 2020

Ji Li

Caiwen Ding

416

17 Sep 2020

Simplified TinyBERT: Knowledge Distillation for Document RetrievalEuropean Conference on Information Retrieval (ECIR), 2020

Xuanang Chen

191

16 Sep 2020

It's Not Just Size That Matters: Small Language Models Are Also Few-Shot LearnersNorth American Chapter of the Association for Computational Linguistics (NAACL), 2020

Timo Schick

Hinrich Schütze

438

1,063

15 Sep 2020

Real-Time Execution of Large-scale Language Models on Mobile

Caiwen Ding

125

15 Sep 2020

Efficient Transformers: A SurveyACM Computing Surveys (ACM CSUR), 2020

863

1,350

14 Sep 2020

Compressed Deep Networks: Goodbye SVD, Hello Robust Low-Rank Approximation

219

11 Sep 2020

Pay Attention when Required

Swetha Mandava

Szymon Migacz

A. Fit-Florea

196

09 Sep 2020

Compression of Deep Learning Models for Text: A SurveyACM Transactions on Knowledge Discovery from Data (TKDD), 2020

Manish Gupta

Puneet Agrawal

VLM MedIm AI4CE

510

134

12 Aug 2020