ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1909.10351
  4. Cited By
TinyBERT: Distilling BERT for Natural Language Understanding
v1v2v3v4v5 (latest)

TinyBERT: Distilling BERT for Natural Language Understanding

Findings (Findings), 2019
23 September 2019
Xiaoqi Jiao
Yichun Yin
Lifeng Shang
Xin Jiang
Xiao Chen
Linlin Li
F. Wang
Qun Liu
    VLM
ArXiv (abs)PDFHTML

Papers citing "TinyBERT: Distilling BERT for Natural Language Understanding"

50 / 1,056 papers shown
Dynamic Knowledge Distillation for Pre-trained Language Models
Dynamic Knowledge Distillation for Pre-trained Language ModelsConference on Empirical Methods in Natural Language Processing (EMNLP), 2021
Lei Li
Yankai Lin
Shuhuai Ren
Peng Li
Jie Zhou
Xu Sun
251
58
0
23 Sep 2021
Distiller: A Systematic Study of Model Distillation Methods in Natural
  Language Processing
Distiller: A Systematic Study of Model Distillation Methods in Natural Language Processing
Haoyu He
Xingjian Shi
Jonas W. Mueller
Zha Sheng
Mu Li
George Karypis
139
10
0
23 Sep 2021
RAIL-KD: RAndom Intermediate Layer Mapping for Knowledge Distillation
RAIL-KD: RAndom Intermediate Layer Mapping for Knowledge Distillation
Md. Akmal Haidar
Nithin Anchuri
Mehdi Rezagholizadeh
Abbas Ghaddar
Philippe Langlais
Pascal Poupart
329
26
0
21 Sep 2021
Knowledge Distillation with Noisy Labels for Natural Language
  Understanding
Knowledge Distillation with Noisy Labels for Natural Language Understanding
Shivendra Bhardwaj
Abbas Ghaddar
Ahmad Rashid
Khalil Bibi
Cheng-huan Li
A. Ghodsi
Philippe Langlais
Mehdi Rezagholizadeh
174
2
0
21 Sep 2021
Classification-based Quality Estimation: Small and Efficient Models for
  Real-world Applications
Classification-based Quality Estimation: Small and Efficient Models for Real-world Applications
Shuo Sun
Ahmed El-Kishky
Vishrav Chaudhary
James Cross
Francisco Guzmán
Lucia Specia
140
1
0
17 Sep 2021
General Cross-Architecture Distillation of Pretrained Language Models
  into Matrix Embeddings
General Cross-Architecture Distillation of Pretrained Language Models into Matrix Embeddings
Lukas Galke
Isabelle Cuber
Christophe Meyer
Henrik Ferdinand Nolscher
Angelina Sonderecker
A. Scherp
264
2
0
17 Sep 2021
Distilling Linguistic Context for Language Model Compression
Distilling Linguistic Context for Language Model Compression
Geondo Park
Gyeongman Kim
Eunho Yang
185
42
0
17 Sep 2021
Improving Streaming Transformer Based ASR Under a Framework of
  Self-supervised Learning
Improving Streaming Transformer Based ASR Under a Framework of Self-supervised Learning
Songjun Cao
Yueteng Kang
Yanzhe Fu
Xiaoshuo Xu
Sining Sun
Yike Zhang
Long Ma
185
16
0
15 Sep 2021
EfficientBERT: Progressively Searching Multilayer Perceptron via Warm-up
  Knowledge Distillation
EfficientBERT: Progressively Searching Multilayer Perceptron via Warm-up Knowledge Distillation
Chenhe Dong
Guangrun Wang
Hang Xu
Jiefeng Peng
Xiaozhe Ren
Xiaodan Liang
191
28
0
15 Sep 2021
Will this Question be Answered? Question Filtering via Answer Model
  Distillation for Efficient Question Answering
Will this Question be Answered? Question Filtering via Answer Model Distillation for Efficient Question Answering
Siddhant Garg
Alessandro Moschitti
168
29
0
14 Sep 2021
KroneckerBERT: Learning Kronecker Decomposition for Pre-trained Language
  Models via Knowledge Distillation
KroneckerBERT: Learning Kronecker Decomposition for Pre-trained Language Models via Knowledge Distillation
Marzieh S. Tahaei
Ella Charlaix
V. Nia
A. Ghodsi
Mehdi Rezagholizadeh
206
22
0
13 Sep 2021
On Language Models for Creoles
On Language Models for Creoles
Heather Lent
Emanuele Bugliarello
Miryam de Lhoneux
Chen Qiu
Anders Søgaard
223
27
0
13 Sep 2021
Learning to Ground Visual Objects for Visual Dialog
Learning to Ground Visual Objects for Visual Dialog
Feilong Chen
Xiuyi Chen
Can Xu
Daxin Jiang
OOD
199
18
0
13 Sep 2021
How to Select One Among All? An Extensive Empirical Study Towards the
  Robustness of Knowledge Distillation in Natural Language Understanding
How to Select One Among All? An Extensive Empirical Study Towards the Robustness of Knowledge Distillation in Natural Language Understanding
Tianda Li
Ahmad Rashid
A. Jafari
Pranav Sharma
A. Ghodsi
Mehdi Rezagholizadeh
AAML
285
5
0
13 Sep 2021
FLiText: A Faster and Lighter Semi-Supervised Text Classification with
  Convolution Networks
FLiText: A Faster and Lighter Semi-Supervised Text Classification with Convolution Networks
Chen Liu
Mengchao Zhang
Liang Pang
Jiafeng Guo
Xueqi Cheng
CLIP
161
20
0
12 Sep 2021
Block Pruning For Faster Transformers
Block Pruning For Faster TransformersConference on Empirical Methods in Natural Language Processing (EMNLP), 2021
François Lagunas
Ella Charlaix
Victor Sanh
Alexander M. Rush
VLM
252
252
0
10 Sep 2021
Learning to Teach with Student Feedback
Learning to Teach with Student Feedback
Yitao Liu
Tianxiang Sun
Xipeng Qiu
Xuanjing Huang
VLM
153
6
0
10 Sep 2021
PIMNet: A Parallel, Iterative and Mimicking Network for Scene Text
  Recognition
PIMNet: A Parallel, Iterative and Mimicking Network for Scene Text RecognitionACM Multimedia (ACM MM), 2021
Zhi Qiao
Can Ma
Jin Wei
Wei Wang
Yuanqing Zhang
Ning Jiang
Hongbin Wang
Weiping Wang
250
80
0
09 Sep 2021
NU:BRIEF -- A Privacy-aware Newsletter Personalization Engine for
  Publishers
NU:BRIEF -- A Privacy-aware Newsletter Personalization Engine for PublishersACM Conference on Recommender Systems (RecSys), 2021
Ernesto Diaz-Aviles
Claudia Orellana-Rodriguez
Igor Brigadir
Reshma Narayanan Kutty
SyDa
102
0
0
08 Sep 2021
What's Hidden in a One-layer Randomly Weighted Transformer?
What's Hidden in a One-layer Randomly Weighted Transformer?Conference on Empirical Methods in Natural Language Processing (EMNLP), 2021
Sheng Shen
Z. Yao
Douwe Kiela
Kurt Keutzer
Michael W. Mahoney
164
6
0
08 Sep 2021
Hi, my name is Martha: Using names to measure and mitigate bias in
  generative dialogue models
Hi, my name is Martha: Using names to measure and mitigate bias in generative dialogue models
Eric Michael Smith
Adina Williams
246
31
0
07 Sep 2021
Beyond Preserved Accuracy: Evaluating Loyalty and Robustness of BERT
  Compression
Beyond Preserved Accuracy: Evaluating Loyalty and Robustness of BERT CompressionConference on Empirical Methods in Natural Language Processing (EMNLP), 2021
Canwen Xu
Wangchunshu Zhou
Tao Ge
Kelvin J. Xu
Julian McAuley
Furu Wei
236
46
0
07 Sep 2021
Sequential Attention Module for Natural Language Processing
Sequential Attention Module for Natural Language Processing
Mengyuan Zhou
Jian Ma
Haiqing Yang
Lian-Xin Jiang
Yang Mo
AI4TS
97
2
0
07 Sep 2021
What Have Been Learned & What Should Be Learned? An Empirical Study of
  How to Selectively Augment Text for Classification
What Have Been Learned & What Should Be Learned? An Empirical Study of How to Selectively Augment Text for Classification
Biyang Guo
S. Han
Hailiang Huang
135
5
0
01 Sep 2021
DNNFusion: Accelerating Deep Neural Networks Execution with Advanced
  Operator Fusion
DNNFusion: Accelerating Deep Neural Networks Execution with Advanced Operator FusionACM Transactions on Architecture and Code Optimization (TACO) (TACO), 2020
Wei Niu
Jiexiong Guan
Yanzhi Wang
G. Agrawal
Bin Ren
AI4CE
237
189
0
30 Aug 2021
FedKD: Communication Efficient Federated Learning via Knowledge
  Distillation
FedKD: Communication Efficient Federated Learning via Knowledge DistillationNature Communications (Nat Commun), 2021
Chuhan Wu
Fangzhao Wu
Lingjuan Lyu
Yongfeng Huang
Xing Xie
FedML
291
497
0
30 Aug 2021
AEDA: An Easier Data Augmentation Technique for Text Classification
AEDA: An Easier Data Augmentation Technique for Text ClassificationConference on Empirical Methods in Natural Language Processing (EMNLP), 2021
Akbar Karimi
L. Rossi
Andrea Prati
184
191
0
30 Aug 2021
Analyzing and Mitigating Interference in Neural Architecture Search
Analyzing and Mitigating Interference in Neural Architecture SearchInternational Conference on Machine Learning (ICML), 2021
Jin Xu
Xu Tan
Kaitao Song
Renqian Luo
Yichong Leng
Tao Qin
Tie-Yan Liu
Jian Li
MoMe
258
30
0
29 Aug 2021
Layer-wise Model Pruning based on Mutual Information
Layer-wise Model Pruning based on Mutual InformationConference on Empirical Methods in Natural Language Processing (EMNLP), 2021
Chun Fan
Jiwei Li
Xiang Ao
Leilei Gan
Yuxian Meng
Xiaofei Sun
158
23
0
28 Aug 2021
Distilling the Knowledge of Large-scale Generative Models into Retrieval
  Models for Efficient Open-domain Conversation
Distilling the Knowledge of Large-scale Generative Models into Retrieval Models for Efficient Open-domain ConversationConference on Empirical Methods in Natural Language Processing (EMNLP), 2021
Beomsu Kim
Seokjun Seo
Seungju Han
Enkhbayar Erdenee
Buru Chang
RALM
223
6
0
28 Aug 2021
Code-switched inspired losses for generic spoken dialog representations
Code-switched inspired losses for generic spoken dialog representationsConference on Empirical Methods in Natural Language Processing (EMNLP), 2021
E. Chapuis
Pierre Colombo
Matthieu Labeau
Chloe Clave
370
12
0
27 Aug 2021
Can the Transformer Be Used as a Drop-in Replacement for RNNs in
  Text-Generating GANs?
Can the Transformer Be Used as a Drop-in Replacement for RNNs in Text-Generating GANs?Recent Advances in Natural Language Processing (RANLP), 2021
Kevin Blin
Andrei Kucharavy
256
2
0
26 Aug 2021
Design and Scaffolded Training of an Efficient DNN Operator for Computer
  Vision on the Edge
Design and Scaffolded Training of an Efficient DNN Operator for Computer Vision on the EdgeACM Transactions on Embedded Computing Systems (TECS), 2021
Vinod Ganesan
Pratyush Kumar
280
2
0
25 Aug 2021
Influence-guided Data Augmentation for Neural Tensor Completion
Influence-guided Data Augmentation for Neural Tensor CompletionInternational Conference on Information and Knowledge Management (CIKM), 2021
Sejoon Oh
Sungchul Kim
Ryan Rossi
Srijan Kumar
162
18
0
23 Aug 2021
Deploying a BERT-based Query-Title Relevance Classifier in a Production
  System: a View from the Trenches
Deploying a BERT-based Query-Title Relevance Classifier in a Production System: a View from the Trenches
Leonard Dahlmann
Tomer Lancewicki
MQ
132
3
0
23 Aug 2021
UNIQORN: Unified Question Answering over RDF Knowledge Graphs and
  Natural Language Text
UNIQORN: Unified Question Answering over RDF Knowledge Graphs and Natural Language Text
Soumajit Pramanik
Jesujoba Oluwadara Alabi
Rishiraj Saha Roy
Gerhard Weikum
RALM
806
35
0
19 Aug 2021
FlipDA: Effective and Robust Data Augmentation for Few-Shot Learning
FlipDA: Effective and Robust Data Augmentation for Few-Shot LearningAnnual Meeting of the Association for Computational Linguistics (ACL), 2021
Jing Zhou
Yanan Zheng
Jie Tang
Jian Li
Zhilin Yang
VLM
273
91
0
13 Aug 2021
AMMUS : A Survey of Transformer-based Pretrained Models in Natural
  Language Processing
AMMUS : A Survey of Transformer-based Pretrained Models in Natural Language Processing
Katikapalli Subramanyam Kalyan
A. Rajasekharan
S. Sangeetha
VLMLM&MA
313
315
0
12 Aug 2021
Decoupled Transformer for Scalable Inference in Open-domain Question
  Answering
Decoupled Transformer for Scalable Inference in Open-domain Question AnsweringRecent Advances in Natural Language Processing (RANLP), 2021
Haytham ElFadeel
Stanislav Peshterliev
211
1
0
05 Aug 2021
Knowledge Distillation from BERT Transformer to Speech Transformer for
  Intent Classification
Knowledge Distillation from BERT Transformer to Speech Transformer for Intent ClassificationInterspeech (Interspeech), 2021
Yiding Jiang
Bidisha Sharma
Maulik C. Madhavi
Haizhou Li
178
30
0
05 Aug 2021
AutoTinyBERT: Automatic Hyper-parameter Optimization for Efficient
  Pre-trained Language Models
AutoTinyBERT: Automatic Hyper-parameter Optimization for Efficient Pre-trained Language ModelsAnnual Meeting of the Association for Computational Linguistics (ACL), 2021
Yichun Yin
Cheng Chen
Lifeng Shang
Xin Jiang
Xiao Chen
Qun Liu
VLM
174
52
0
29 Jul 2021
Pre-train, Prompt, and Predict: A Systematic Survey of Prompting Methods
  in Natural Language Processing
Pre-train, Prompt, and Predict: A Systematic Survey of Prompting Methods in Natural Language ProcessingACM Computing Surveys (CSUR), 2021
Pengfei Liu
Weizhe Yuan
Jinlan Fu
Zhengbao Jiang
Hiroaki Hayashi
Graham Neubig
VLMSyDa
795
4,933
0
28 Jul 2021
An Argumentative Dialogue System for COVID-19 Vaccine Information
An Argumentative Dialogue System for COVID-19 Vaccine InformationChinese Conference on Logic and Argumentation (CLA), 2021
Bettina Fazzinga
Andrea Galassi
Paolo Torroni
204
19
0
26 Jul 2021
Multi-stage Pre-training over Simplified Multimodal Pre-training Models
Multi-stage Pre-training over Simplified Multimodal Pre-training ModelsAnnual Meeting of the Association for Computational Linguistics (ACL), 2021
Tongtong Liu
Fangxiang Feng
Caixia Yuan
79
16
0
22 Jul 2021
Follow Your Path: a Progressive Method for Knowledge Distillation
Follow Your Path: a Progressive Method for Knowledge Distillation
Wenxian Shi
Yuxuan Song
Hao Zhou
Bohan Li
Lei Li
126
18
0
20 Jul 2021
Scene-adaptive Knowledge Distillation for Sequential Recommendation via
  Differentiable Architecture Search
Scene-adaptive Knowledge Distillation for Sequential Recommendation via Differentiable Architecture Search
Lei-tai Chen
Fajie Yuan
Jiaxi Yang
Min Yang
Chengming Li
162
4
0
15 Jul 2021
FLAT: An Optimized Dataflow for Mitigating Attention Bottlenecks
FLAT: An Optimized Dataflow for Mitigating Attention Bottlenecks
Sheng-Chun Kao
Suvinay Subramanian
Gaurav Agrawal
Amir Yazdanbakhsh
T. Krishna
467
88
0
13 Jul 2021
A Flexible Multi-Task Model for BERT Serving
A Flexible Multi-Task Model for BERT Serving
Tianwen Wei
Jianwei Qi
Shenghuang He
103
8
0
12 Jul 2021
A Survey on Data Augmentation for Text Classification
A Survey on Data Augmentation for Text Classification
Markus Bayer
M. Kaufhold
Christian A. Reuter
471
426
0
07 Jul 2021
Learning Efficient Vision Transformers via Fine-Grained Manifold
  Distillation
Learning Efficient Vision Transformers via Fine-Grained Manifold Distillation
Zhiwei Hao
Jianyuan Guo
Ding Jia
Kai Han
Yehui Tang
Chao Zhang
Dacheng Tao
Yunhe Wang
ViT
442
90
0
03 Jul 2021
Previous
123...161718...202122
Next
Page 17 of 22
Pageof 22