v1v2v3v4v5 (latest)

TinyBERT: Distilling BERT for Natural Language Understanding

Findings (Findings), 2019

23 September 2019

Xiaoqi Jiao

Yichun Yin

Lifeng Shang

Xin Jiang

Linlin Li

Qun Liu

Papers citing "TinyBERT: Distilling BERT for Natural Language Understanding"

50 / 1,056 papers shown

VIRT: Improving Representation-based Models for Text Matching through Virtual Interaction

Enhong Chen

204

08 Dec 2021

Bootstrapping ViTs: Towards Liberating Vision Transformers from Pre-training

279

07 Dec 2021

Causal Distillation for Language ModelsNorth American Chapter of the Association for Computational Linguistics (NAACL), 2021

293

05 Dec 2021

Adaptive Token Sampling For Efficient Vision Transformers

Mohsen Fayyaz

Soroush Abbasi Koohpayegani

F. Jafari

Sunando Sengupta

Hamid Reza Vaezi Joze

Eric Sommerlade

Hamed Pirsiavash

Juergen Gall

ViT

380

222

30 Nov 2021

FQ-ViT: Post-Training Quantization for Fully Quantized Vision TransformerInternational Joint Conference on Artificial Intelligence (IJCAI), 2021

Shuchang Zhou

489

214

27 Nov 2021

Hierarchical Knowledge Distillation for Dialogue Sequence LabelingAutomatic Speech Recognition & Understanding (ASRU), 2021

129

22 Nov 2021

Can depth-adaptive BERT perform better on binary classification tasks

183

22 Nov 2021

Dynamic-TinyBERT: Boost TinyBERT's Inference Efficiency by Dynamic Sequence Length

18 Nov 2021

DeBERTaV3: Improving DeBERTa using ELECTRA-Style Pre-Training with Gradient-Disentangled Embedding Sharing

Pengcheng He

Jianfeng Gao

Weizhu Chen

872

1,615

18 Nov 2021

Dynamically pruning segformer for efficient semantic segmentation

Haoli Bai

Hongda Mao

D. Nair

113

18 Nov 2021

Character-level HyperNetworks for Hate Speech DetectionExpert systems with applications (ESWA), 2021

Tomer Wullach

A. Adler

Einat Minkov

184

11 Nov 2021

Edge-Cloud Polarization and Collaboration: A Comprehensive Survey for AIIEEE Transactions on Knowledge and Data Engineering (TKDE), 2021

Jiangchao Yao

Shengyu Zhang

Yang Yao

Feng Wang

Jianxin Ma

...

Kun Kuang

Chao-Xiang Wu

Leilei Gan

Jingren Zhou

Hongxia Yang

390

145

11 Nov 2021

Prune Once for All: Sparse Pre-Trained Language Models

208

10 Nov 2021

NLP From Scratch Without Large-Scale Pretraining: A Simple and Efficient Framework

280

07 Nov 2021

Sampling Equivariant Self-attention Networks for Object Detection in Aerial ImagesIEEE Transactions on Image Processing (TIP), 2021

201

05 Nov 2021

Arch-Net: Model Distillation for Architecture Agnostic Model Deployment

Yi Yang

Shuchang Zhou

395

01 Nov 2021

Magic Pyramid: Accelerating Inference with Early Exiting and Token Pruning

213

30 Oct 2021

Skyformer: Remodel Self-Attention with Gaussian Kernel and Nyström MethodNeural Information Processing Systems (NeurIPS), 2021

Yifan Chen

Qi Zeng

Heng Ji

Yun Yang

227

29 Oct 2021

NxMTransformer: Semi-Structured Sparsification for Natural Language Understanding via ADMMNeural Information Processing Systems (NeurIPS), 2021

Connor Holmes

Minjia Zhang

Yuxiong He

Bo Wu

143

28 Oct 2021

Mosaicking to Distill: Knowledge Distillation from Out-of-Domain Data

212

27 Oct 2021

IconQA: A New Benchmark for Abstract Diagram Understanding and Visual Language Reasoning

Xiaodan Liang

414

261

25 Oct 2021

Vis-TOP: Visual Transformer Overlay Processor

244

21 Oct 2021

BERMo: What can BERT learn from ELMo?

Sangamesh Kodge

Kaushik Roy

173

18 Oct 2021

Self-Supervised Representation Learning: Introduction, Advances and Challenges

Linus Ericsson

Henry Gouk

Chen Change Loy

Timothy M. Hospedales

SSL OOD AI4TS

252

352

18 Oct 2021

HRKD: Hierarchical Relational Knowledge Distillation for Cross-domain Language Model Compression

318

16 Oct 2021

Sparse Distillation: Speeding Up Text Classification by Using Bigger Student Models

Qinyuan Ye

Madian Khabsa

M. Lewis

Sinong Wang

Xiang Ren

Aaron Jaech

221

16 Oct 2021

Lifelong Pretraining: Continually Adapting Language Models to Emerging Corpora

Xiang Ren

387

138

16 Oct 2021

Pro-KD: Progressive Distillation by Following the Footsteps of the Teacher

249

16 Oct 2021

A Short Study on Compressing Decoder-Based Language Models

Habib Hajimolahoseini

Yang Liu

Mehdi Rezagholizadeh

252

16 Oct 2021

Robustness Challenges in Model Distillation and Pruning for Natural Language Understanding

Ahmed Hassan Awadallah

225

16 Oct 2021

Sparse Progressive Distillation: Resolving Overfitting under Pretrain-and-Finetune Paradigm

Dongkuan Xu

...

Sanguthevar Rajasekaran

Hang Liu

Caiwen Ding

CLL VLM

226

15 Oct 2021

Kronecker Decomposition for GPT Compression

232

15 Oct 2021

SPoT: Better Frozen Model Adaptation through Soft Prompt Transfer

473

317

15 Oct 2021

Towards Efficient NLP: A Standard Evaluation and A Strong Baseline

Xiangyang Liu

Tianxiang Sun

Xuanjing Huang

Xipeng Qiu

ELM

219

13 Oct 2021

Mengzi: Towards Lightweight yet Ingenious Pre-trained Models for Chinese

222

13 Oct 2021

Pre-trained Language Models in Biomedical Domain: A Systematic SurveyACM Computing Surveys (CSUR), 2021

481

214

11 Oct 2021

Global Vision Transformer Pruning with Hessian-Aware SaliencyComputer Vision and Pattern Recognition (CVPR), 2021

Huanrui Yang

213

10 Oct 2021

SuperShaper: Task-Agnostic Super Pre-training of BERT Models with Variable Hidden Dimensions

Vinod Ganesan

Gowtham Ramesh

Pratyush Kumar

143

10 Oct 2021

LIDSNet: A Lightweight on-device Intent Detection model using Deep Siamese Network

Vibhav Agarwal

Sudeep Deepak Shivnikar

Sourav Ghosh

H. Arora

Yash Saini

131

06 Oct 2021

DistilHuBERT: Speech Representation Learning by Layer-wise Distillation of Hidden-unit BERT

651

206

05 Oct 2021

Data Augmentation Approaches in Natural Language Processing: A Survey

Bohan Li

Yutai Hou

Wanxiang Che

428

352

05 Oct 2021

MoEfication: Transformer Feed-forward Layers are Mixtures of Experts

Zhengyan Zhang

Yankai Lin

Zhiyuan Liu

Peng Li

Maosong Sun

Jie Zhou

MoE

430

166

05 Oct 2021

SDR: Efficient Neural Re-ranking using Succinct Document Representation

245

03 Oct 2021

SlovakBERT: Slovak Masked Language Model

186

30 Sep 2021

Towards Efficient Post-training Quantization of Pre-trained Language Models

Haoli Bai

Lu Hou

Lifeng Shang

Xin Jiang

Irwin King

Michael R. Lyu

236

30 Sep 2021

Deep Neural Compression Via Concurrent Pruning and Self-Distillation

134

30 Sep 2021

FQuAD2.0: French Question Answering and knowing that you know nothing

Quentin Heinrich

Gautier Viaud

Wacim Belblidia

140

27 Sep 2021

Understanding and Overcoming the Challenges of Efficient Transformer QuantizationConference on Empirical Methods in Natural Language Processing (EMNLP), 2021

Yelysei Bondarenko

Markus Nagel

Tijmen Blankevoort

242

176

27 Sep 2021

Improving Question Answering Performance Using Knowledge Distillation and Active LearningEngineering applications of artificial intelligence (EAAI), 2021

Yasaman Boreshban

Seyed Morteza Mirbostani

Gholamreza Ghassem-Sani

Seyed Abolghasem Mirroshandel

Shahin Amiriparian

205

26 Sep 2021

DACT-BERT: Differentiable Adaptive Computation Time for an Efficient BERT Inference

135

24 Sep 2021