ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1909.10351
  4. Cited By
TinyBERT: Distilling BERT for Natural Language Understanding
v1v2v3v4v5 (latest)

TinyBERT: Distilling BERT for Natural Language Understanding

Findings (Findings), 2019
23 September 2019
Xiaoqi Jiao
Yichun Yin
Lifeng Shang
Xin Jiang
Xiao Chen
Linlin Li
F. Wang
Qun Liu
    VLM
ArXiv (abs)PDFHTML

Papers citing "TinyBERT: Distilling BERT for Natural Language Understanding"

50 / 1,055 papers shown
Learning from the Best: Rationalizing Prediction by Adversarial
  Information Calibration
Learning from the Best: Rationalizing Prediction by Adversarial Information CalibrationAAAI Conference on Artificial Intelligence (AAAI), 2020
Lei Sha
Oana-Maria Camburu
Thomas Lukasiewicz
395
40
0
16 Dec 2020
A Lightweight Neural Model for Biomedical Entity Linking
A Lightweight Neural Model for Biomedical Entity LinkingAAAI Conference on Artificial Intelligence (AAAI), 2020
Lihu Chen
Gaël Varoquaux
Fabian M. Suchanek
MedIm
156
37
0
16 Dec 2020
EmpLite: A Lightweight Sequence Labeling Model for Emphasis Selection of
  Short Texts
EmpLite: A Lightweight Sequence Labeling Model for Emphasis Selection of Short TextsICON (ICON), 2020
Vibhav Agarwal
Sourav Ghosh
Kranti Chalamalasetti
B. Challa
S. Kumari
Harshavardhana
Barath Raj Kandur Raja
81
4
0
15 Dec 2020
Parameter-Efficient Transfer Learning with Diff Pruning
Parameter-Efficient Transfer Learning with Diff PruningAnnual Meeting of the Association for Computational Linguistics (ACL), 2020
Demi Guo
Alexander M. Rush
Yoon Kim
326
469
0
14 Dec 2020
LRC-BERT: Latent-representation Contrastive Knowledge Distillation for
  Natural Language Understanding
LRC-BERT: Latent-representation Contrastive Knowledge Distillation for Natural Language UnderstandingAAAI Conference on Artificial Intelligence (AAAI), 2020
Hao Fu
Shaojun Zhou
Qihong Yang
Junjie Tang
Guiquan Liu
Kaikui Liu
Xiaolong Li
297
66
0
14 Dec 2020
MiniVLM: A Smaller and Faster Vision-Language Model
MiniVLM: A Smaller and Faster Vision-Language Model
Jianfeng Wang
Xiaowei Hu
Pengchuan Zhang
Xiujun Li
Lijuan Wang
Guang Dai
Jianfeng Gao
Zicheng Liu
VLMMLLM
236
70
0
13 Dec 2020
Reinforced Multi-Teacher Selection for Knowledge Distillation
Reinforced Multi-Teacher Selection for Knowledge DistillationAAAI Conference on Artificial Intelligence (AAAI), 2020
Fei Yuan
Linjun Shou
Jian Pei
Wutao Lin
Ming Gong
Yan Fu
Daxin Jiang
321
147
0
11 Dec 2020
Improving Task-Agnostic BERT Distillation with Layer Mapping Search
Improving Task-Agnostic BERT Distillation with Layer Mapping SearchNeurocomputing (Neurocomputing), 2020
Xiaoqi Jiao
Huating Chang
Yichun Yin
Lifeng Shang
Xin Jiang
Xiao Chen
Linlin Li
Fang Wang
Qun Liu
128
12
0
11 Dec 2020
Meta-KD: A Meta Knowledge Distillation Framework for Language Model
  Compression across Domains
Meta-KD: A Meta Knowledge Distillation Framework for Language Model Compression across DomainsAnnual Meeting of the Association for Computational Linguistics (ACL), 2020
Haojie Pan
Chengyu Wang
Minghui Qiu
Yichang Zhang
Yaliang Li
Yanjie Liang
222
63
0
02 Dec 2020
CPM: A Large-scale Generative Chinese Pre-trained Language Model
CPM: A Large-scale Generative Chinese Pre-trained Language ModelAI Open (AO), 2020
Zhengyan Zhang
Xu Han
Hao Zhou
Pei Ke
Yuxian Gu
...
Wentao Han
Jie Tang
Juan-Zi Li
Xiaoyan Zhu
Maosong Sun
219
128
0
01 Dec 2020
A Selective Survey on Versatile Knowledge Distillation Paradigm for
  Neural Network Models
A Selective Survey on Versatile Knowledge Distillation Paradigm for Neural Network Models
J. Ku
Jihun Oh
Youngyoon Lee
Gaurav Pooniwala
Sangjeong Lee
188
3
0
30 Nov 2020
Bringing AI To Edge: From Deep Learning's Perspective
Bringing AI To Edge: From Deep Learning's PerspectiveNeurocomputing (Neurocomputing), 2020
Di Liu
Hao Kong
Xiangzhong Luo
Weichen Liu
Ravi Subramaniam
250
152
0
25 Nov 2020
EasyTransfer -- A Simple and Scalable Deep Transfer Learning Platform
  for NLP Applications
EasyTransfer -- A Simple and Scalable Deep Transfer Learning Platform for NLP ApplicationsInternational Conference on Information and Knowledge Management (CIKM), 2020
Minghui Qiu
Peng Li
Chengyu Wang
Hanjie Pan
Yaliang Li
...
Jun Yang
Yaliang Li
Yanjie Liang
Deng Cai
Jialin Li
VLMSyDa
362
20
0
18 Nov 2020
Know What You Don't Need: Single-Shot Meta-Pruning for Attention Heads
Know What You Don't Need: Single-Shot Meta-Pruning for Attention Heads
Zhengyan Zhang
Fanchao Qi
Zhiyuan Liu
Qun Liu
Maosong Sun
VLM
165
34
0
07 Nov 2020
Influence Patterns for Explaining Information Flow in BERT
Influence Patterns for Explaining Information Flow in BERTNeural Information Processing Systems (NeurIPS), 2020
Kaiji Lu
Zifan Wang
Piotr (Peter) Mardziel
Anupam Datta
GNN
242
19
0
02 Nov 2020
FastFormers: Highly Efficient Transformer Models for Natural Language
  Understanding
FastFormers: Highly Efficient Transformer Models for Natural Language Understanding
Young Jin Kim
Hany Awadalla
AI4CE
164
47
0
26 Oct 2020
Accelerating Training of Transformer-Based Language Models with
  Progressive Layer Dropping
Accelerating Training of Transformer-Based Language Models with Progressive Layer DroppingNeural Information Processing Systems (NeurIPS), 2020
Minjia Zhang
Yuxiong He
AI4CE
146
116
0
26 Oct 2020
Pre-trained Summarization Distillation
Pre-trained Summarization Distillation
Sam Shleifer
Alexander M. Rush
194
118
0
24 Oct 2020
Knowledge Distillation for Improved Accuracy in Spoken Question
  Answering
Knowledge Distillation for Improved Accuracy in Spoken Question Answering
Chenyu You
Polydoros Giannouris
Yuexian Zou
399
58
0
21 Oct 2020
Optimal Subarchitecture Extraction For BERT
Optimal Subarchitecture Extraction For BERT
Adrian de Wynter
Daniel J. Perry
MQ
228
18
0
20 Oct 2020
BERT2DNN: BERT Distillation with Massive Unlabeled Data for Online
  E-Commerce Search
BERT2DNN: BERT Distillation with Massive Unlabeled Data for Online E-Commerce SearchIndustrial Conference on Data Mining (IDM), 2020
Yunjiang Jiang
Yue Shang
Ziyang Liu
Hongwei Shen
Yun Xiao
Wei Xiong
Sulong Xu
Weipeng P. Yan
Di Jin
146
17
0
20 Oct 2020
HABERTOR: An Efficient and Effective Deep Hatespeech Detector
HABERTOR: An Efficient and Effective Deep Hatespeech DetectorConference on Empirical Methods in Natural Language Processing (EMNLP), 2020
T. Tran
Yifan Hu
Changwei Hu
Kevin Yen
Fei Tan
Kyumin Lee
Serim Park
VLM
192
34
0
17 Oct 2020
AutoADR: Automatic Model Design for Ad Relevance
AutoADR: Automatic Model Design for Ad Relevance
Yiren Chen
Yaming Yang
Hong Sun
Yujing Wang
Yu Xu
Wei Shen
Rong Zhou
Yunhai Tong
Jing Bai
Ruofei Zhang
146
3
0
14 Oct 2020
Weight Squeezing: Reparameterization for Knowledge Transfer and Model
  Compression
Weight Squeezing: Reparameterization for Knowledge Transfer and Model Compression
Artem Chumachenko
Daniil Gavrilov
Nikita Balagansky
Pavel Kalaidin
201
1
0
14 Oct 2020
Pretrained Transformers for Text Ranking: BERT and Beyond
Pretrained Transformers for Text Ranking: BERT and Beyond
Jimmy J. Lin
Rodrigo Nogueira
Andrew Yates
VLM
796
699
0
13 Oct 2020
BERT-EMD: Many-to-Many Layer Mapping for BERT Compression with Earth
  Mover's Distance
BERT-EMD: Many-to-Many Layer Mapping for BERT Compression with Earth Mover's Distance
Jianquan Li
Xiaokang Liu
Honghong Zhao
Ruifeng Xu
Min Yang
Yaohong Jin
214
57
0
13 Oct 2020
Learning Which Features Matter: RoBERTa Acquires a Preference for
  Linguistic Generalizations (Eventually)
Learning Which Features Matter: RoBERTa Acquires a Preference for Linguistic Generalizations (Eventually)
Alex Warstadt
Yian Zhang
Haau-Sing Li
Haokun Liu
Samuel R. Bowman
SSLAI4CE
202
26
0
11 Oct 2020
Adversarial Self-Supervised Data-Free Distillation for Text
  Classification
Adversarial Self-Supervised Data-Free Distillation for Text ClassificationConference on Empirical Methods in Natural Language Processing (EMNLP), 2020
Xinyin Ma
Yongliang Shen
Gongfan Fang
Chen Chen
Chenghao Jia
Weiming Lu
253
27
0
10 Oct 2020
Style Attuned Pre-training and Parameter Efficient Fine-tuning for
  Spoken Language Understanding
Style Attuned Pre-training and Parameter Efficient Fine-tuning for Spoken Language UnderstandingInterspeech (Interspeech), 2020
Jin Cao
Jun Wang
Wael Hamza
Kelly Vanee
Shang-Wen Li
84
10
0
09 Oct 2020
Deep Learning Meets Projective Clustering
Deep Learning Meets Projective ClusteringInternational Conference on Learning Representations (ICLR), 2020
Alaa Maalouf
Harry Lang
Daniela Rus
Dan Feldman
217
10
0
08 Oct 2020
On the importance of pre-training data volume for compact language
  models
On the importance of pre-training data volume for compact language models
Vincent Micheli
Martin d'Hoffschmidt
Franccois Fleuret
205
45
0
08 Oct 2020
AxFormer: Accuracy-driven Approximation of Transformers for Faster,
  Smaller and more Accurate NLP Models
AxFormer: Accuracy-driven Approximation of Transformers for Faster, Smaller and more Accurate NLP Models
Amrit Nagarajan
Sanchari Sen
Jacob R. Stevens
A. Raghunathan
165
3
0
07 Oct 2020
CATBERT: Context-Aware Tiny BERT for Detecting Social Engineering Emails
CATBERT: Context-Aware Tiny BERT for Detecting Social Engineering Emails
Younghoon Lee
Joshua Saxe
Richard E. Harang
112
29
0
07 Oct 2020
Why Skip If You Can Combine: A Simple Knowledge Distillation Technique
  for Intermediate Layers
Why Skip If You Can Combine: A Simple Knowledge Distillation Technique for Intermediate Layers
Yimeng Wu
Peyman Passban
Mehdi Rezagholizade
Qun Liu
MoE
133
37
0
06 Oct 2020
Pruning Redundant Mappings in Transformer Models via Spectral-Normalized
  Identity Prior
Pruning Redundant Mappings in Transformer Models via Spectral-Normalized Identity PriorFindings (Findings), 2020
Zi Lin
Jeremiah Zhe Liu
Ziao Yang
Nan Hua
Dan Roth
200
49
0
05 Oct 2020
Pea-KD: Parameter-efficient and Accurate Knowledge Distillation on BERT
Pea-KD: Parameter-efficient and Accurate Knowledge Distillation on BERT
Ikhyun Cho
U. Kang
119
1
0
30 Sep 2020
Contrastive Distillation on Intermediate Representations for Language
  Model Compression
Contrastive Distillation on Intermediate Representations for Language Model Compression
S. Sun
Zhe Gan
Yu Cheng
Yuwei Fang
Shuohang Wang
Jingjing Liu
VLM
227
81
0
29 Sep 2020
TernaryBERT: Distillation-aware Ultra-low Bit BERT
TernaryBERT: Distillation-aware Ultra-low Bit BERTConference on Empirical Methods in Natural Language Processing (EMNLP), 2020
Wei Zhang
Lu Hou
Yichun Yin
Lifeng Shang
Xiao Chen
Xin Jiang
Qun Liu
MQ
287
225
0
27 Sep 2020
A little goes a long way: Improving toxic language classification
  despite data scarcity
A little goes a long way: Improving toxic language classification despite data scarcityFindings (Findings), 2020
Mika Juuti
Tommi Gröndahl
Adrian Flanagan
Nirmal Asokan
243
28
0
25 Sep 2020
RecoBERT: A Catalog Language Model for Text-Based Recommendations
RecoBERT: A Catalog Language Model for Text-Based RecommendationsFindings (Findings), 2020
Itzik Malkiel
Oren Barkan
Avi Caciularu
Noam Razin
Ori Katz
Noam Koenigstein
266
14
0
25 Sep 2020
Hierarchical Pre-training for Sequence Labelling in Spoken Dialog
Hierarchical Pre-training for Sequence Labelling in Spoken DialogFindings (Findings), 2020
E. Chapuis
Pierre Colombo
Matteo Manica
Matthieu Labeau
Chloé Clavel
446
64
0
23 Sep 2020
Weight Distillation: Transferring the Knowledge in Neural Network
  Parameters
Weight Distillation: Transferring the Knowledge in Neural Network ParametersAnnual Meeting of the Association for Computational Linguistics (ACL), 2020
Ye Lin
Yanyang Li
Ziyang Wang
Bei Li
Quan Du
Tong Xiao
Jingbo Zhu
278
28
0
19 Sep 2020
Efficient Transformer-based Large Scale Language Representations using
  Hardware-friendly Block Structured Pruning
Efficient Transformer-based Large Scale Language Representations using Hardware-friendly Block Structured PruningFindings (Findings), 2020
Bingbing Li
Zhenglun Kong
Tianyun Zhang
Ji Li
Hao Sun
Hang Liu
Caiwen Ding
VLM
416
66
0
17 Sep 2020
Simplified TinyBERT: Knowledge Distillation for Document Retrieval
Simplified TinyBERT: Knowledge Distillation for Document RetrievalEuropean Conference on Information Retrieval (ECIR), 2020
Xuanang Chen
Xianpei Han
Kai Hui
Le Sun
Yingfei Sun
191
28
0
16 Sep 2020
It's Not Just Size That Matters: Small Language Models Are Also Few-Shot
  Learners
It's Not Just Size That Matters: Small Language Models Are Also Few-Shot LearnersNorth American Chapter of the Association for Computational Linguistics (NAACL), 2020
Timo Schick
Hinrich Schütze
438
1,063
0
15 Sep 2020
Real-Time Execution of Large-scale Language Models on Mobile
Real-Time Execution of Large-scale Language Models on Mobile
Wei Niu
Zhenglun Kong
Geng Yuan
Weiwen Jiang
Jiexiong Guan
Caiwen Ding
Pu Zhao
Sijia Liu
Bin Ren
Yanzhi Wang
MQ
125
7
0
15 Sep 2020
Efficient Transformers: A Survey
Efficient Transformers: A SurveyACM Computing Surveys (ACM CSUR), 2020
Yi Tay
Mostafa Dehghani
Dara Bahri
Donald Metzler
VLM
863
1,350
0
14 Sep 2020
Compressed Deep Networks: Goodbye SVD, Hello Robust Low-Rank
  Approximation
Compressed Deep Networks: Goodbye SVD, Hello Robust Low-Rank Approximation
M. Tukan
Alaa Maalouf
Matan Weksler
Dan Feldman
219
9
0
11 Sep 2020
Pay Attention when Required
Pay Attention when Required
Swetha Mandava
Szymon Migacz
A. Fit-Florea
196
11
0
09 Sep 2020
Compression of Deep Learning Models for Text: A Survey
Compression of Deep Learning Models for Text: A SurveyACM Transactions on Knowledge Discovery from Data (TKDD), 2020
Manish Gupta
Puneet Agrawal
VLMMedImAI4CE
510
134
0
12 Aug 2020
Previous
123...19202122
Next