ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1909.10351
  4. Cited By
TinyBERT: Distilling BERT for Natural Language Understanding
v1v2v3v4v5 (latest)

TinyBERT: Distilling BERT for Natural Language Understanding

Findings (Findings), 2019
23 September 2019
Xiaoqi Jiao
Yichun Yin
Lifeng Shang
Xin Jiang
Xiao Chen
Linlin Li
F. Wang
Qun Liu
    VLM
ArXiv (abs)PDFHTML

Papers citing "TinyBERT: Distilling BERT for Natural Language Understanding"

50 / 1,056 papers shown
VIRT: Improving Representation-based Models for Text Matching through
  Virtual Interaction
VIRT: Improving Representation-based Models for Text Matching through Virtual Interaction
Dan Li
Yang Yang
Hongyin Tang
Jingang Wang
Tong Xu
Wei Wu
Enhong Chen
204
11
0
08 Dec 2021
Bootstrapping ViTs: Towards Liberating Vision Transformers from
  Pre-training
Bootstrapping ViTs: Towards Liberating Vision Transformers from Pre-training
Haofei Zhang
Jiarui Duan
Mengqi Xue
Mingli Song
Li Sun
Xiuming Zhang
ViTAI4CE
279
16
0
07 Dec 2021
Causal Distillation for Language Models
Causal Distillation for Language ModelsNorth American Chapter of the Association for Computational Linguistics (NAACL), 2021
Zhengxuan Wu
Atticus Geiger
J. Rozner
Elisa Kreiss
Hanson Lu
Thomas Icard
Christopher Potts
Noah D. Goodman
293
29
0
05 Dec 2021
Adaptive Token Sampling For Efficient Vision Transformers
Adaptive Token Sampling For Efficient Vision Transformers
Mohsen Fayyaz
Soroush Abbasi Koohpayegani
F. Jafari
Sunando Sengupta
Hamid Reza Vaezi Joze
Eric Sommerlade
Hamed Pirsiavash
Juergen Gall
ViT
380
222
0
30 Nov 2021
FQ-ViT: Post-Training Quantization for Fully Quantized Vision
  Transformer
FQ-ViT: Post-Training Quantization for Fully Quantized Vision TransformerInternational Joint Conference on Artificial Intelligence (IJCAI), 2021
Yang Lin
Tianyu Zhang
Peiqin Sun
Zheng Li
Shuchang Zhou
ViTMQ
489
214
0
27 Nov 2021
Hierarchical Knowledge Distillation for Dialogue Sequence Labeling
Hierarchical Knowledge Distillation for Dialogue Sequence LabelingAutomatic Speech Recognition & Understanding (ASRU), 2021
Shota Orihashi
Yoshihiro Yamazaki
Naoki Makishima
Mana Ihori
Akihiko Takashima
Tomohiro Tanaka
Ryo Masumura
129
0
0
22 Nov 2021
Can depth-adaptive BERT perform better on binary classification tasks
Can depth-adaptive BERT perform better on binary classification tasks
Jing Fan
Xin Zhang
Sheng Zhang
Yan Pan
Lixiang Guo
MQ
183
0
0
22 Nov 2021
Dynamic-TinyBERT: Boost TinyBERT's Inference Efficiency by Dynamic
  Sequence Length
Dynamic-TinyBERT: Boost TinyBERT's Inference Efficiency by Dynamic Sequence Length
Shira Guskin
Moshe Wasserblat
Ke Ding
Gyuwan Kim
61
7
0
18 Nov 2021
DeBERTaV3: Improving DeBERTa using ELECTRA-Style Pre-Training with
  Gradient-Disentangled Embedding Sharing
DeBERTaV3: Improving DeBERTa using ELECTRA-Style Pre-Training with Gradient-Disentangled Embedding Sharing
Pengcheng He
Jianfeng Gao
Weizhu Chen
872
1,615
0
18 Nov 2021
Dynamically pruning segformer for efficient semantic segmentation
Dynamically pruning segformer for efficient semantic segmentation
Haoli Bai
Hongda Mao
D. Nair
113
24
0
18 Nov 2021
Character-level HyperNetworks for Hate Speech Detection
Character-level HyperNetworks for Hate Speech DetectionExpert systems with applications (ESWA), 2021
Tomer Wullach
A. Adler
Einat Minkov
184
18
0
11 Nov 2021
Edge-Cloud Polarization and Collaboration: A Comprehensive Survey for AI
Edge-Cloud Polarization and Collaboration: A Comprehensive Survey for AIIEEE Transactions on Knowledge and Data Engineering (TKDE), 2021
Jiangchao Yao
Shengyu Zhang
Yang Yao
Feng Wang
Jianxin Ma
...
Kun Kuang
Chao-Xiang Wu
Leilei Gan
Jingren Zhou
Hongxia Yang
390
145
0
11 Nov 2021
Prune Once for All: Sparse Pre-Trained Language Models
Prune Once for All: Sparse Pre-Trained Language Models
Ofir Zafrir
Ariel Larey
Guy Boudoukh
Haihao Shen
Moshe Wasserblat
VLM
208
97
0
10 Nov 2021
NLP From Scratch Without Large-Scale Pretraining: A Simple and Efficient
  Framework
NLP From Scratch Without Large-Scale Pretraining: A Simple and Efficient Framework
Xingcheng Yao
Yanan Zheng
Xiaocong Yang
Zhilin Yang
280
50
0
07 Nov 2021
Sampling Equivariant Self-attention Networks for Object Detection in
  Aerial Images
Sampling Equivariant Self-attention Networks for Object Detection in Aerial ImagesIEEE Transactions on Image Processing (TIP), 2021
Guo-Ye Yang
Xiang-Li Li
Ralph Robert Martin
Shimin Hu
3DPC
201
18
0
05 Nov 2021
Arch-Net: Model Distillation for Architecture Agnostic Model Deployment
Arch-Net: Model Distillation for Architecture Agnostic Model Deployment
Weixin Xu
Zipeng Feng
Shuangkang Fang
Song Yuan
Yi Yang
Shuchang Zhou
MQ
395
1
0
01 Nov 2021
Magic Pyramid: Accelerating Inference with Early Exiting and Token
  Pruning
Magic Pyramid: Accelerating Inference with Early Exiting and Token Pruning
Xuanli He
I. Keivanloo
Yi Xu
Xiang He
Belinda Zeng
Santosh Rajagopalan
Trishul Chilimbi
213
22
0
30 Oct 2021
Skyformer: Remodel Self-Attention with Gaussian Kernel and Nyström
  Method
Skyformer: Remodel Self-Attention with Gaussian Kernel and Nyström MethodNeural Information Processing Systems (NeurIPS), 2021
Yifan Chen
Qi Zeng
Heng Ji
Yun Yang
227
63
0
29 Oct 2021
NxMTransformer: Semi-Structured Sparsification for Natural Language
  Understanding via ADMM
NxMTransformer: Semi-Structured Sparsification for Natural Language Understanding via ADMMNeural Information Processing Systems (NeurIPS), 2021
Connor Holmes
Minjia Zhang
Yuxiong He
Bo Wu
143
24
0
28 Oct 2021
Mosaicking to Distill: Knowledge Distillation from Out-of-Domain Data
Mosaicking to Distill: Knowledge Distillation from Out-of-Domain Data
Gongfan Fang
Yifan Bao
Mingli Song
Xinchao Wang
Don Xie
Chengchao Shen
Xiuming Zhang
212
48
0
27 Oct 2021
IconQA: A New Benchmark for Abstract Diagram Understanding and Visual
  Language Reasoning
IconQA: A New Benchmark for Abstract Diagram Understanding and Visual Language Reasoning
Pan Lu
Liang Qiu
Jiaqi Chen
Tony Xia
Yizhou Zhao
Wei Zhang
Zhou Yu
Xiaodan Liang
Song-Chun Zhu
AIMat
414
261
0
25 Oct 2021
Vis-TOP: Visual Transformer Overlay Processor
Vis-TOP: Visual Transformer Overlay Processor
Wei Hu
Dian Xu
Zimeng Fan
Fang Liu
Yanxiang He
BDLViT
244
5
0
21 Oct 2021
BERMo: What can BERT learn from ELMo?
BERMo: What can BERT learn from ELMo?
Sangamesh Kodge
Kaushik Roy
173
4
0
18 Oct 2021
Self-Supervised Representation Learning: Introduction, Advances and
  Challenges
Self-Supervised Representation Learning: Introduction, Advances and Challenges
Linus Ericsson
Henry Gouk
Chen Change Loy
Timothy M. Hospedales
SSLOODAI4TS
252
352
0
18 Oct 2021
HRKD: Hierarchical Relational Knowledge Distillation for Cross-domain
  Language Model Compression
HRKD: Hierarchical Relational Knowledge Distillation for Cross-domain Language Model Compression
Chenhe Dong
Yaliang Li
Ying Shen
Minghui Qiu
VLM
318
8
0
16 Oct 2021
Sparse Distillation: Speeding Up Text Classification by Using Bigger
  Student Models
Sparse Distillation: Speeding Up Text Classification by Using Bigger Student Models
Qinyuan Ye
Madian Khabsa
M. Lewis
Sinong Wang
Xiang Ren
Aaron Jaech
221
5
0
16 Oct 2021
Lifelong Pretraining: Continually Adapting Language Models to Emerging
  Corpora
Lifelong Pretraining: Continually Adapting Language Models to Emerging Corpora
Xisen Jin
Dejiao Zhang
Henghui Zhu
Wei Xiao
Shang-Wen Li
Xiaokai Wei
Andrew O. Arnold
Xiang Ren
KELMCLL
387
138
0
16 Oct 2021
Pro-KD: Progressive Distillation by Following the Footsteps of the
  Teacher
Pro-KD: Progressive Distillation by Following the Footsteps of the Teacher
Mehdi Rezagholizadeh
A. Jafari
Puneeth Salad
Pranav Sharma
Ali Saheb Pasand
A. Ghodsi
249
21
0
16 Oct 2021
A Short Study on Compressing Decoder-Based Language Models
A Short Study on Compressing Decoder-Based Language Models
Tianda Li
Yassir El Mesbahi
I. Kobyzev
Ahmad Rashid
A. Mahmud
Nithin Anchuri
Habib Hajimolahoseini
Yang Liu
Mehdi Rezagholizadeh
252
29
0
16 Oct 2021
Robustness Challenges in Model Distillation and Pruning for Natural
  Language Understanding
Robustness Challenges in Model Distillation and Pruning for Natural Language Understanding
Mengnan Du
Subhabrata Mukherjee
Yu Cheng
Milad Shokouhi
Helen Zhou
Ahmed Hassan Awadallah
225
18
0
16 Oct 2021
Sparse Progressive Distillation: Resolving Overfitting under
  Pretrain-and-Finetune Paradigm
Sparse Progressive Distillation: Resolving Overfitting under Pretrain-and-Finetune Paradigm
Shaoyi Huang
Dongkuan Xu
Ian En-Hsu Yen
Yijue Wang
Sung-En Chang
...
Shiyang Chen
Mimi Xie
Sanguthevar Rajasekaran
Hang Liu
Caiwen Ding
CLLVLM
226
36
0
15 Oct 2021
Kronecker Decomposition for GPT Compression
Kronecker Decomposition for GPT Compression
Ali Edalati
Marzieh S. Tahaei
Ahmad Rashid
V. Nia
J. Clark
Mehdi Rezagholizadeh
232
40
0
15 Oct 2021
SPoT: Better Frozen Model Adaptation through Soft Prompt Transfer
SPoT: Better Frozen Model Adaptation through Soft Prompt Transfer
Tu Vu
Brian Lester
Noah Constant
Rami Al-Rfou
Daniel Cer
VLMLRM
473
317
0
15 Oct 2021
Towards Efficient NLP: A Standard Evaluation and A Strong Baseline
Towards Efficient NLP: A Standard Evaluation and A Strong Baseline
Xiangyang Liu
Tianxiang Sun
Junliang He
Jiawen Wu
Lingling Wu
Xinyu Zhang
Hao Jiang
Bo Zhao
Xuanjing Huang
Xipeng Qiu
ELM
219
58
0
13 Oct 2021
Mengzi: Towards Lightweight yet Ingenious Pre-trained Models for Chinese
Mengzi: Towards Lightweight yet Ingenious Pre-trained Models for Chinese
Zhuosheng Zhang
Hanqing Zhang
Keming Chen
Yuhang Guo
Jingyun Hua
Yulong Wang
Ming Zhou
VLM
222
78
0
13 Oct 2021
Pre-trained Language Models in Biomedical Domain: A Systematic Survey
Pre-trained Language Models in Biomedical Domain: A Systematic SurveyACM Computing Surveys (CSUR), 2021
Benyou Wang
Qianqian Xie
Jiahuan Pei
Zhihong Chen
Prayag Tiwari
Zhao Li
Jie Fu
LM&MAAI4CE
481
214
0
11 Oct 2021
Global Vision Transformer Pruning with Hessian-Aware Saliency
Global Vision Transformer Pruning with Hessian-Aware SaliencyComputer Vision and Pattern Recognition (CVPR), 2021
Huanrui Yang
Hongxu Yin
Maying Shen
Pavlo Molchanov
Hai Helen Li
Jan Kautz
ViT
213
80
0
10 Oct 2021
SuperShaper: Task-Agnostic Super Pre-training of BERT Models with
  Variable Hidden Dimensions
SuperShaper: Task-Agnostic Super Pre-training of BERT Models with Variable Hidden Dimensions
Vinod Ganesan
Gowtham Ramesh
Pratyush Kumar
143
10
0
10 Oct 2021
LIDSNet: A Lightweight on-device Intent Detection model using Deep
  Siamese Network
LIDSNet: A Lightweight on-device Intent Detection model using Deep Siamese Network
Vibhav Agarwal
Sudeep Deepak Shivnikar
Sourav Ghosh
H. Arora
Yash Saini
131
11
0
06 Oct 2021
DistilHuBERT: Speech Representation Learning by Layer-wise Distillation
  of Hidden-unit BERT
DistilHuBERT: Speech Representation Learning by Layer-wise Distillation of Hidden-unit BERT
Heng-Jui Chang
Shu-Wen Yang
Hung-yi Lee
SSL
651
206
0
05 Oct 2021
Data Augmentation Approaches in Natural Language Processing: A Survey
Data Augmentation Approaches in Natural Language Processing: A Survey
Bohan Li
Yutai Hou
Wanxiang Che
428
352
0
05 Oct 2021
MoEfication: Transformer Feed-forward Layers are Mixtures of Experts
MoEfication: Transformer Feed-forward Layers are Mixtures of Experts
Zhengyan Zhang
Yankai Lin
Zhiyuan Liu
Peng Li
Maosong Sun
Jie Zhou
MoE
430
166
0
05 Oct 2021
SDR: Efficient Neural Re-ranking using Succinct Document Representation
SDR: Efficient Neural Re-ranking using Succinct Document Representation
Nachshon Cohen
Amit Portnoy
B. Fetahu
A. Ingber
AI4TS
245
11
0
03 Oct 2021
SlovakBERT: Slovak Masked Language Model
SlovakBERT: Slovak Masked Language Model
Matúš Pikuliak
Stefan Grivalsky
Martin Konopka
Miroslav Blšták
Martin Tamajka
Viktor Bachratý
Marian Simko
Pavol Balázik
Michal Trnka
Filip Uhlárik
186
33
0
30 Sep 2021
Towards Efficient Post-training Quantization of Pre-trained Language
  Models
Towards Efficient Post-training Quantization of Pre-trained Language Models
Haoli Bai
Lu Hou
Lifeng Shang
Xin Jiang
Irwin King
Michael R. Lyu
MQ
236
52
0
30 Sep 2021
Deep Neural Compression Via Concurrent Pruning and Self-Distillation
Deep Neural Compression Via Concurrent Pruning and Self-Distillation
J. Ó. Neill
Sourav Dutta
H. Assem
VLM
134
5
0
30 Sep 2021
FQuAD2.0: French Question Answering and knowing that you know nothing
FQuAD2.0: French Question Answering and knowing that you know nothing
Quentin Heinrich
Gautier Viaud
Wacim Belblidia
140
9
0
27 Sep 2021
Understanding and Overcoming the Challenges of Efficient Transformer
  Quantization
Understanding and Overcoming the Challenges of Efficient Transformer QuantizationConference on Empirical Methods in Natural Language Processing (EMNLP), 2021
Yelysei Bondarenko
Markus Nagel
Tijmen Blankevoort
MQ
242
176
0
27 Sep 2021
Improving Question Answering Performance Using Knowledge Distillation
  and Active Learning
Improving Question Answering Performance Using Knowledge Distillation and Active LearningEngineering applications of artificial intelligence (EAAI), 2021
Yasaman Boreshban
Seyed Morteza Mirbostani
Gholamreza Ghassem-Sani
Seyed Abolghasem Mirroshandel
Shahin Amiriparian
205
18
0
26 Sep 2021
DACT-BERT: Differentiable Adaptive Computation Time for an Efficient
  BERT Inference
DACT-BERT: Differentiable Adaptive Computation Time for an Efficient BERT Inference
Cristobal Eyzaguirre
Felipe del-Rio
Vladimir Araujo
Alvaro Soto
135
8
0
24 Sep 2021
Previous
123...151617...202122
Next
Page 16 of 22
Pageof 22