ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1909.10351
  4. Cited By
TinyBERT: Distilling BERT for Natural Language Understanding
v1v2v3v4v5 (latest)

TinyBERT: Distilling BERT for Natural Language Understanding

Findings (Findings), 2019
23 September 2019
Xiaoqi Jiao
Yichun Yin
Lifeng Shang
Xin Jiang
Xiao Chen
Linlin Li
F. Wang
Qun Liu
    VLM
ArXiv (abs)PDFHTML

Papers citing "TinyBERT: Distilling BERT for Natural Language Understanding"

50 / 1,055 papers shown
Title
Question and Answer Test-Train Overlap in Open-Domain Question Answering
  Datasets
Question and Answer Test-Train Overlap in Open-Domain Question Answering DatasetsConference of the European Chapter of the Association for Computational Linguistics (EACL), 2020
Patrick Lewis
Pontus Stenetorp
Sebastian Riedel
OODELM
235
196
0
06 Aug 2020
ConvBERT: Improving BERT with Span-based Dynamic Convolution
ConvBERT: Improving BERT with Span-based Dynamic ConvolutionNeural Information Processing Systems (NeurIPS), 2020
Zihang Jiang
Weihao Yu
Daquan Zhou
Yunpeng Chen
Jiashi Feng
Shuicheng Yan
318
195
0
06 Aug 2020
Understanding BERT Rankers Under Distillation
Understanding BERT Rankers Under Distillation
Luyu Gao
Zhuyun Dai
Jamie Callan
144
55
0
21 Jul 2020
SqueezeBERT: What can computer vision teach NLP about efficient neural
  networks?
SqueezeBERT: What can computer vision teach NLP about efficient neural networks?
F. Iandola
Albert Eaton Shaw
Ravi Krishna
Kurt Keutzer
VLM
230
136
0
19 Jun 2020
Knowledge Distillation: A Survey
Knowledge Distillation: A Survey
Jianping Gou
B. Yu
Stephen J. Maybank
Dacheng Tao
VLM
1.6K
3,631
0
09 Jun 2020
BERT Loses Patience: Fast and Robust Inference with Early Exit
BERT Loses Patience: Fast and Robust Inference with Early Exit
Wangchunshu Zhou
Canwen Xu
Tao Ge
Julian McAuley
Ke Xu
Furu Wei
294
394
0
07 Jun 2020
Accelerating Natural Language Understanding in Task-Oriented Dialog
Accelerating Natural Language Understanding in Task-Oriented Dialog
Ojas Ahuja
Shrey Desai
VLM
115
1
0
05 Jun 2020
An Overview of Neural Network Compression
An Overview of Neural Network Compression
James OÑeill
AI4CE
292
112
0
05 Jun 2020
Language Models are Few-Shot Learners
Language Models are Few-Shot LearnersNeural Information Processing Systems (NeurIPS), 2020
Tom B. Brown
Benjamin Mann
Nick Ryder
Melanie Subbiah
Jared Kaplan
...
Christopher Berner
Sam McCandlish
Alec Radford
Ilya Sutskever
Dario Amodei
BDL
2.0K
51,554
0
28 May 2020
Adversarial NLI for Factual Correctness in Text Summarisation Models
Adversarial NLI for Factual Correctness in Text Summarisation Models
Mario Barrantes
Benedikt Herudek
Richard Wang
95
18
0
24 May 2020
FashionBERT: Text and Image Matching with Adaptive Loss for Cross-modal
  Retrieval
FashionBERT: Text and Image Matching with Adaptive Loss for Cross-modal Retrieval
D. Gao
Linbo Jin
Ben Chen
Minghui Qiu
Peng Li
Yi Wei
Yitao Hu
Haozhe Jasper Wang
OOD
193
146
0
20 May 2020
Distilling Knowledge from Ensembles of Acoustic Models for Joint
  CTC-Attention End-to-End Speech Recognition
Distilling Knowledge from Ensembles of Acoustic Models for Joint CTC-Attention End-to-End Speech Recognition
Yan Gao
Titouan Parcollet
Nicholas D. Lane
VLM
166
15
0
19 May 2020
Speech to Text Adaptation: Towards an Efficient Cross-Modal Distillation
Speech to Text Adaptation: Towards an Efficient Cross-Modal Distillation
Won Ik Cho
Donghyun Kwak
J. Yoon
N. Kim
229
27
0
17 May 2020
Movement Pruning: Adaptive Sparsity by Fine-Tuning
Movement Pruning: Adaptive Sparsity by Fine-Tuning
Victor Sanh
Thomas Wolf
Alexander M. Rush
295
549
0
15 May 2020
Machine Reading Comprehension: The Role of Contextualized Language
  Models and Beyond
Machine Reading Comprehension: The Role of Contextualized Language Models and Beyond
Zhuosheng Zhang
Hai Zhao
Rui Wang
192
66
0
13 May 2020
Distilling Knowledge from Pre-trained Language Models via Text Smoothing
Distilling Knowledge from Pre-trained Language Models via Text Smoothing
Xing Wu
Zichen Liu
Xiangyang Zhou
Dianhai Yu
124
6
0
08 May 2020
GOBO: Quantizing Attention-Based NLP Models for Low Latency and Energy
  Efficient Inference
GOBO: Quantizing Attention-Based NLP Models for Low Latency and Energy Efficient Inference
Ali Hadi Zadeh
Isak Edo
Omar Mohamed Awad
Andreas Moshovos
MQ
258
208
0
08 May 2020
DeFormer: Decomposing Pre-trained Transformers for Faster Question
  Answering
DeFormer: Decomposing Pre-trained Transformers for Faster Question AnsweringAnnual Meeting of the Association for Computational Linguistics (ACL), 2020
Qingqing Cao
H. Trivedi
A. Balasubramanian
Niranjan Balasubramanian
171
70
0
02 May 2020
When BERT Plays the Lottery, All Tickets Are Winning
When BERT Plays the Lottery, All Tickets Are WinningConference on Empirical Methods in Natural Language Processing (EMNLP), 2020
Sai Prasanna
Anna Rogers
Anna Rumshisky
MILM
256
199
0
01 May 2020
HERO: Hierarchical Encoder for Video+Language Omni-representation
  Pre-training
HERO: Hierarchical Encoder for Video+Language Omni-representation Pre-trainingConference on Empirical Methods in Natural Language Processing (EMNLP), 2020
Linjie Li
Yen-Chun Chen
Yu Cheng
Zhe Gan
Licheng Yu
Jingjing Liu
MLLMVLMOffRLAI4TS
613
536
0
01 May 2020
DeeBERT: Dynamic Early Exiting for Accelerating BERT Inference
DeeBERT: Dynamic Early Exiting for Accelerating BERT InferenceAnnual Meeting of the Association for Computational Linguistics (ACL), 2020
Ji Xin
Raphael Tang
Jaejun Lee
Yaoliang Yu
Jimmy J. Lin
197
430
0
27 Apr 2020
ColBERT: Efficient and Effective Passage Search via Contextualized Late
  Interaction over BERT
ColBERT: Efficient and Effective Passage Search via Contextualized Late Interaction over BERTAnnual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR), 2020
Omar Khattab
Matei A. Zaharia
357
1,714
0
27 Apr 2020
Probabilistically Masked Language Model Capable of Autoregressive
  Generation in Arbitrary Word Order
Probabilistically Masked Language Model Capable of Autoregressive Generation in Arbitrary Word Order
Yi-Lun Liao
Xin Jiang
Qun Liu
104
41
0
24 Apr 2020
The Cost of Training NLP Models: A Concise Overview
The Cost of Training NLP Models: A Concise Overview
Or Sharir
Barak Peleg
Y. Shoham
203
229
0
19 Apr 2020
The Right Tool for the Job: Matching Model and Instance Complexities
The Right Tool for the Job: Matching Model and Instance ComplexitiesAnnual Meeting of the Association for Computational Linguistics (ACL), 2020
Roy Schwartz
Gabriel Stanovsky
Swabha Swayamdipta
Jesse Dodge
Noah A. Smith
297
177
0
16 Apr 2020
Training with Quantization Noise for Extreme Model Compression
Training with Quantization Noise for Extreme Model CompressionInternational Conference on Learning Representations (ICLR), 2020
Angela Fan
Pierre Stock
Benjamin Graham
Edouard Grave
Remi Gribonval
Edouard Grave
Armand Joulin
MQ
239
256
0
15 Apr 2020
XtremeDistil: Multi-stage Distillation for Massive Multilingual Models
XtremeDistil: Multi-stage Distillation for Massive Multilingual ModelsAnnual Meeting of the Association for Computational Linguistics (ACL), 2020
Subhabrata Mukherjee
Ahmed Hassan Awadallah
191
62
0
12 Apr 2020
LadaBERT: Lightweight Adaptation of BERT through Hybrid Model
  Compression
LadaBERT: Lightweight Adaptation of BERT through Hybrid Model CompressionInternational Conference on Computational Linguistics (COLING), 2020
Yihuan Mao
Yujing Wang
Chufan Wu
Chen Zhang
Yang-Feng Wang
Yaming Yang
Quanlu Zhang
Yunhai Tong
Jing Bai
139
80
0
08 Apr 2020
DynaBERT: Dynamic BERT with Adaptive Width and Depth
DynaBERT: Dynamic BERT with Adaptive Width and DepthNeural Information Processing Systems (NeurIPS), 2020
Lu Hou
Zhiqi Huang
Lifeng Shang
Xin Jiang
Xiao Chen
Qun Liu
MQ
242
352
0
08 Apr 2020
Structure-Level Knowledge Distillation For Multilingual Sequence
  Labeling
Structure-Level Knowledge Distillation For Multilingual Sequence LabelingAnnual Meeting of the Association for Computational Linguistics (ACL), 2020
Xinyu Wang
Yong Jiang
Nguyen Bach
Tao Wang
Fei Huang
Kewei Tu
232
38
0
08 Apr 2020
On the Effect of Dropping Layers of Pre-trained Transformer Models
On the Effect of Dropping Layers of Pre-trained Transformer ModelsComputer Speech and Language (CSL), 2020
Hassan Sajjad
Fahim Dalvi
Nadir Durrani
Preslav Nakov
259
172
0
08 Apr 2020
Towards Non-task-specific Distillation of BERT via Sentence
  Representation Approximation
Towards Non-task-specific Distillation of BERT via Sentence Representation Approximation
Bowen Wu
Huan Zhang
Mengyuan Li
Zongsheng Wang
Qihang Feng
Junhong Huang
Baoxun Wang
133
4
0
07 Apr 2020
MobileBERT: a Compact Task-Agnostic BERT for Resource-Limited Devices
MobileBERT: a Compact Task-Agnostic BERT for Resource-Limited DevicesAnnual Meeting of the Association for Computational Linguistics (ACL), 2020
Zhiqing Sun
Hongkun Yu
Xiaodan Song
Renjie Liu
Yiming Yang
Denny Zhou
MQ
359
916
0
06 Apr 2020
FastBERT: a Self-distilling BERT with Adaptive Inference Time
FastBERT: a Self-distilling BERT with Adaptive Inference TimeAnnual Meeting of the Association for Computational Linguistics (ACL), 2020
Weijie Liu
Peng Zhou
Zhe Zhao
Zhiruo Wang
Haotang Deng
Qi Ju
228
392
0
05 Apr 2020
Pre-trained Models for Natural Language Processing: A Survey
Pre-trained Models for Natural Language Processing: A SurveyScience China Technological Sciences (Sci China Technol Sci), 2020
Xipeng Qiu
Tianxiang Sun
Yige Xu
Yunfan Shao
Ning Dai
Xuanjing Huang
LM&MAVLM
933
1,607
0
18 Mar 2020
A Survey on Contextual Embeddings
A Survey on Contextual Embeddings
Qi Liu
Matt J. Kusner
Phil Blunsom
433
169
0
16 Mar 2020
TextBrewer: An Open-Source Knowledge Distillation Toolkit for Natural
  Language Processing
TextBrewer: An Open-Source Knowledge Distillation Toolkit for Natural Language ProcessingAnnual Meeting of the Association for Computational Linguistics (ACL), 2020
Ziqing Yang
Yiming Cui
Zhipeng Chen
Wanxiang Che
Ting Liu
Shijin Wang
Guoping Hu
VLM
179
50
0
28 Feb 2020
A Primer in BERTology: What we know about how BERT works
A Primer in BERTology: What we know about how BERT worksTransactions of the Association for Computational Linguistics (TACL), 2020
Anna Rogers
Olga Kovaleva
Anna Rumshisky
OffRL
393
1,690
0
27 Feb 2020
Compressing Large-Scale Transformer-Based Models: A Case Study on BERT
Compressing Large-Scale Transformer-Based Models: A Case Study on BERTTransactions of the Association for Computational Linguistics (TACL), 2020
Prakhar Ganesh
Yao Chen
Xin Lou
Mohammad Ali Khan
Yifan Yang
Hassan Sajjad
Preslav Nakov
Deming Chen
Marianne Winslett
AI4CE
393
213
0
27 Feb 2020
MiniLM: Deep Self-Attention Distillation for Task-Agnostic Compression
  of Pre-Trained Transformers
MiniLM: Deep Self-Attention Distillation for Task-Agnostic Compression of Pre-Trained TransformersNeural Information Processing Systems (NeurIPS), 2020
Wenhui Wang
Furu Wei
Li Dong
Hangbo Bao
Nan Yang
Ming Zhou
VLM
833
1,707
0
25 Feb 2020
Improving BERT Fine-Tuning via Self-Ensemble and Self-Distillation
Improving BERT Fine-Tuning via Self-Ensemble and Self-DistillationJournal of Computational Science and Technology (JCST), 2020
Yige Xu
Xipeng Qiu
L. Zhou
Xuanjing Huang
135
73
0
24 Feb 2020
TwinBERT: Distilling Knowledge to Twin-Structured BERT Models for
  Efficient Retrieval
TwinBERT: Distilling Knowledge to Twin-Structured BERT Models for Efficient Retrieval
Wenhao Lu
Jian Jiao
Ruofei Zhang
171
52
0
14 Feb 2020
BERT-of-Theseus: Compressing BERT by Progressive Module Replacing
BERT-of-Theseus: Compressing BERT by Progressive Module ReplacingConference on Empirical Methods in Natural Language Processing (EMNLP), 2020
Canwen Xu
Wangchunshu Zhou
Tao Ge
Furu Wei
Ming Zhou
644
216
0
07 Feb 2020
Aligning the Pretraining and Finetuning Objectives of Language Models
Aligning the Pretraining and Finetuning Objectives of Language Models
Nuo Wang Pierse
Jing Lu
AI4CE
95
2
0
05 Feb 2020
AdaBERT: Task-Adaptive BERT Compression with Differentiable Neural
  Architecture Search
AdaBERT: Task-Adaptive BERT Compression with Differentiable Neural Architecture SearchInternational Joint Conference on Artificial Intelligence (IJCAI), 2020
Daoyuan Chen
Yaliang Li
Minghui Qiu
Zhen Wang
Bofang Li
Bolin Ding
Hongbo Deng
Yanjie Liang
Jialin Li
Jingren Zhou
MQ
209
106
0
13 Jan 2020
The State of Knowledge Distillation for Classification
The State of Knowledge Distillation for Classification
Fabian Ruffy
K. Chahal
162
21
0
20 Dec 2019
WaLDORf: Wasteless Language-model Distillation On Reading-comprehension
WaLDORf: Wasteless Language-model Distillation On Reading-comprehension
J. Tian
A. Kreuzer
Pai-Hung Chen
Hans-Martin Will
VLM
162
3
0
13 Dec 2019
Unsupervised Pre-training for Natural Language Generation: A Literature
  Review
Unsupervised Pre-training for Natural Language Generation: A Literature Review
Yuanxin Liu
Zheng Lin
SSLAI4CE
110
5
0
13 Nov 2019
MKD: a Multi-Task Knowledge Distillation Approach for Pretrained
  Language Models
MKD: a Multi-Task Knowledge Distillation Approach for Pretrained Language Models
Linqing Liu
Haiquan Wang
Jimmy J. Lin
R. Socher
Caiming Xiong
178
23
0
09 Nov 2019
Blockwise Self-Attention for Long Document Understanding
Blockwise Self-Attention for Long Document UnderstandingFindings (Findings), 2019
J. Qiu
Hao Ma
Omer Levy
Scott Yih
Sinong Wang
Jie Tang
272
269
0
07 Nov 2019
Previous
123...202122
Next