ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1909.10351
  4. Cited By
TinyBERT: Distilling BERT for Natural Language Understanding
v1v2v3v4v5 (latest)

TinyBERT: Distilling BERT for Natural Language Understanding

Findings (Findings), 2019
23 September 2019
Xiaoqi Jiao
Yichun Yin
Lifeng Shang
Xin Jiang
Xiao Chen
Linlin Li
F. Wang
Qun Liu
    VLM
ArXiv (abs)PDFHTML

Papers citing "TinyBERT: Distilling BERT for Natural Language Understanding"

50 / 1,056 papers shown
Mitigating Gender Bias in Distilled Language Models via Counterfactual
  Role Reversal
Mitigating Gender Bias in Distilled Language Models via Counterfactual Role ReversalFindings (Findings), 2022
Umang Gupta
Jwala Dhamala
Varun Kumar
Apurv Verma
Yada Pruksachatkun
Satyapriya Krishna
Rahul Gupta
Kai-Wei Chang
Greg Ver Steeg
Aram Galstyan
174
61
0
23 Mar 2022
Input-specific Attention Subnetworks for Adversarial Detection
Input-specific Attention Subnetworks for Adversarial DetectionFindings (Findings), 2022
Emil Biju
Anirudh Sriram
Pratyush Kumar
Mitesh M Khapra
AAML
162
5
0
23 Mar 2022
Text Transformations in Contrastive Self-Supervised Learning: A Review
Text Transformations in Contrastive Self-Supervised Learning: A ReviewInternational Joint Conference on Artificial Intelligence (IJCAI), 2022
Amrita Bhattacharjee
Mansooreh Karami
Huan Liu
SSL
381
23
0
22 Mar 2022
Out-of-distribution Generalization with Causal Invariant Transformations
Out-of-distribution Generalization with Causal Invariant TransformationsComputer Vision and Pattern Recognition (CVPR), 2022
Ruoyu Wang
Mingyang Yi
Zhitang Chen
Shengyu Zhu
OODOODD
256
80
0
22 Mar 2022
DQ-BART: Efficient Sequence-to-Sequence Model via Joint Distillation and
  Quantization
DQ-BART: Efficient Sequence-to-Sequence Model via Joint Distillation and QuantizationAnnual Meeting of the Association for Computational Linguistics (ACL), 2022
Zheng Li
Zijian Wang
Ming Tan
Ramesh Nallapati
Parminder Bhatia
Andrew O. Arnold
Bing Xiang
Dan Roth
MQ
171
46
0
21 Mar 2022
Compression of Generative Pre-trained Language Models via Quantization
Compression of Generative Pre-trained Language Models via QuantizationAnnual Meeting of the Association for Computational Linguistics (ACL), 2022
Chaofan Tao
Lu Hou
Wei Zhang
Lifeng Shang
Xin Jiang
Qun Liu
Ping Luo
Ngai Wong
MQ
262
116
0
21 Mar 2022
When Chosen Wisely, More Data Is What You Need: A Universal
  Sample-Efficient Strategy For Data Augmentation
When Chosen Wisely, More Data Is What You Need: A Universal Sample-Efficient Strategy For Data AugmentationFindings (Findings), 2022
Ehsan Kamalloo
Mehdi Rezagholizadeh
A. Ghodsi
218
11
0
17 Mar 2022
Compressing Sentence Representation for Semantic Retrieval via
  Homomorphic Projective Distillation
Compressing Sentence Representation for Semantic Retrieval via Homomorphic Projective DistillationFindings (Findings), 2022
Xuandong Zhao
Zhiguo Yu
Ming-li Wu
Lei Li
113
8
0
15 Mar 2022
The Optimal BERT Surgeon: Scalable and Accurate Second-Order Pruning for
  Large Language Models
The Optimal BERT Surgeon: Scalable and Accurate Second-Order Pruning for Large Language ModelsConference on Empirical Methods in Natural Language Processing (EMNLP), 2022
Eldar Kurtic
Daniel Fernando Campos
Tuan Nguyen
Elias Frantar
Mark Kurtz
Ben Fineran
Michael Goin
Dan Alistarh
VLMMQMedIm
395
146
0
14 Mar 2022
BiBERT: Accurate Fully Binarized BERT
BiBERT: Accurate Fully Binarized BERTInternational Conference on Learning Representations (ICLR), 2022
Haotong Qin
Yifu Ding
Mingyuan Zhang
Qing Yan
Aishan Liu
Qingqing Dang
Ziwei Liu
Xianglong Liu
MQ
195
113
0
12 Mar 2022
Enabling Multimodal Generation on CLIP via Vision-Language Knowledge
  Distillation
Enabling Multimodal Generation on CLIP via Vision-Language Knowledge DistillationFindings (Findings), 2022
Wenliang Dai
Lu Hou
Lifeng Shang
Xin Jiang
Qun Liu
Pascale Fung
VLM
235
107
0
12 Mar 2022
LoopITR: Combining Dual and Cross Encoder Architectures for Image-Text
  Retrieval
LoopITR: Combining Dual and Cross Encoder Architectures for Image-Text Retrieval
Jie Lei
Xinlei Chen
Ning Zhang
Meng-xing Wang
Joey Tianyi Zhou
Tamara L. Berg
Licheng Yu
267
15
0
10 Mar 2022
Knowledge Amalgamation for Object Detection with Transformers
Knowledge Amalgamation for Object Detection with TransformersIEEE Transactions on Image Processing (IEEE TIP), 2022
Haofei Zhang
Feng Mao
Mengqi Xue
Gongfan Fang
Zunlei Feng
Mingli Song
Weilong Dai
ViT
385
16
0
07 Mar 2022
A Simple Hash-Based Early Exiting Approach For Language Understanding
  and Generation
A Simple Hash-Based Early Exiting Approach For Language Understanding and GenerationFindings (Findings), 2022
Tianxiang Sun
Xiangyang Liu
Wei-wei Zhu
Zhichao Geng
Lingling Wu
Yilong He
Yuan Ni
Guotong Xie
Xuanjing Huang
Xipeng Qiu
254
42
0
03 Mar 2022
E-LANG: Energy-Based Joint Inferencing of Super and Swift Language
  Models
E-LANG: Energy-Based Joint Inferencing of Super and Swift Language ModelsAnnual Meeting of the Association for Computational Linguistics (ACL), 2022
Mohammad Akbari
Amin Banitalebi-Dehkordi
Yong Zhang
181
9
0
01 Mar 2022
TransKD: Transformer Knowledge Distillation for Efficient Semantic
  Segmentation
TransKD: Transformer Knowledge Distillation for Efficient Semantic Segmentation
R. Liu
Kailun Yang
Alina Roitberg
Kailai Li
Kunyu Peng
Huayao Liu
Yaonan Wang
Rainer Stiefelhagen
ViT
276
58
0
27 Feb 2022
Art Creation with Multi-Conditional StyleGANs
Art Creation with Multi-Conditional StyleGANsInternational Joint Conference on Artificial Intelligence (IJCAI), 2022
Konstantin Dobler
Florian Hübscher
Jan Westphal
Alejandro Sierra-Múnera
Gerard de Melo
Ralf Krestel
GANAI4CE
267
8
0
23 Feb 2022
LAMP: Extracting Text from Gradients with Language Model Priors
LAMP: Extracting Text from Gradients with Language Model PriorsNeural Information Processing Systems (NeurIPS), 2022
Mislav Balunović
Dimitar I. Dimitrov
Nikola Jovanović
Martin Vechev
318
78
0
17 Feb 2022
ZeroGen: Efficient Zero-shot Learning via Dataset Generation
ZeroGen: Efficient Zero-shot Learning via Dataset GenerationConference on Empirical Methods in Natural Language Processing (EMNLP), 2022
Jiacheng Ye
Jiahui Gao
Qintong Li
Hang Xu
Jiangtao Feng
Zhiyong Wu
Tao Yu
Lingpeng Kong
SyDa
351
276
0
16 Feb 2022
A Survey on Model Compression and Acceleration for Pretrained Language
  Models
A Survey on Model Compression and Acceleration for Pretrained Language ModelsAAAI Conference on Artificial Intelligence (AAAI), 2022
Canwen Xu
Julian McAuley
359
87
0
15 Feb 2022
What is Next when Sequential Prediction Meets Implicitly Hard
  Interaction?
What is Next when Sequential Prediction Meets Implicitly Hard Interaction?International Conference on Information and Knowledge Management (CIKM), 2021
Kaixi Hu
Lin Li
Qing Xie
Jianquan Liu
Xiaohui Tao
171
22
0
14 Feb 2022
pNLP-Mixer: an Efficient all-MLP Architecture for Language
pNLP-Mixer: an Efficient all-MLP Architecture for LanguageAnnual Meeting of the Association for Computational Linguistics (ACL), 2022
Francesco Fusco
Damian Pascual
Peter W. J. Staar
Diego Antognini
209
34
0
09 Feb 2022
data2vec: A General Framework for Self-supervised Learning in Speech,
  Vision and Language
data2vec: A General Framework for Self-supervised Learning in Speech, Vision and LanguageInternational Conference on Machine Learning (ICML), 2022
Alexei Baevski
Wei-Ning Hsu
Qiantong Xu
Arun Babu
Jiatao Gu
Michael Auli
SSLVLMViT
584
1,037
0
07 Feb 2022
Aspect-based Sentiment Analysis through EDU-level Attentions
Aspect-based Sentiment Analysis through EDU-level AttentionsPacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD), 2022
Ting Lin
Aixin Sun
Yequan Wang
150
7
0
05 Feb 2022
AutoDistil: Few-shot Task-agnostic Neural Architecture Search for
  Distilling Large Language Models
AutoDistil: Few-shot Task-agnostic Neural Architecture Search for Distilling Large Language Models
Dongkuan Xu
Subhabrata Mukherjee
Xiaodong Liu
Debadeepta Dey
Wenhui Wang
Xiang Zhang
Ahmed Hassan Awadallah
Jianfeng Gao
205
5
0
29 Jan 2022
Table Pre-training: A Survey on Model Architectures, Pre-training
  Objectives, and Downstream Tasks
Table Pre-training: A Survey on Model Architectures, Pre-training Objectives, and Downstream TasksInternational Joint Conference on Artificial Intelligence (IJCAI), 2022
Haoyu Dong
Zhoujun Cheng
Xinyi He
Mengyuan Zhou
Anda Zhou
Fan Zhou
Ao Liu
Shi Han
Dongmei Zhang
LMTD
427
74
0
24 Jan 2022
Can Model Compression Improve NLP Fairness
Can Model Compression Improve NLP Fairness
Guangxuan Xu
Qingyuan Hu
146
30
0
21 Jan 2022
AutoDistill: an End-to-End Framework to Explore and Distill
  Hardware-Efficient Language Models
AutoDistill: an End-to-End Framework to Explore and Distill Hardware-Efficient Language Models
Xiaofan Zhang
Zongwei Zhou
Deming Chen
Yu Emma Wang
173
12
0
21 Jan 2022
VAQF: Fully Automatic Software-Hardware Co-Design Framework for Low-Bit
  Vision Transformer
VAQF: Fully Automatic Software-Hardware Co-Design Framework for Low-Bit Vision Transformer
Mengshu Sun
Haoyu Ma
Guoliang Kang
Lezhi Li
Tianlong Chen
Xiaolong Ma
Zinan Lin
Yanzhi Wang
ViT
283
54
0
17 Jan 2022
Ensemble Transformer for Efficient and Accurate Ranking Tasks: an
  Application to Question Answering Systems
Ensemble Transformer for Efficient and Accurate Ranking Tasks: an Application to Question Answering SystemsConference on Empirical Methods in Natural Language Processing (EMNLP), 2022
Yoshitomo Matsubara
Luca Soldaini
Eric Lind
Alessandro Moschitti
235
7
0
15 Jan 2022
CLIP-TD: CLIP Targeted Distillation for Vision-Language Tasks
CLIP-TD: CLIP Targeted Distillation for Vision-Language Tasks
Zhecan Wang
Noel Codella
Yen-Chun Chen
Luowei Zhou
Jianwei Yang
Xiyang Dai
Bin Xiao
Haoxuan You
Shih-Fu Chang
Lu Yuan
CLIPVLM
213
44
0
15 Jan 2022
Pretrained Language Models for Text Generation: A Survey
Pretrained Language Models for Text Generation: A SurveyACM Computing Surveys (ACM CSUR), 2022
Junyi Li
Tianyi Tang
Wayne Xin Zhao
J. Nie
Ji-Rong Wen
AI4CE
535
268
0
14 Jan 2022
Latency Adjustable Transformer Encoder for Language Understanding
Latency Adjustable Transformer Encoder for Language UnderstandingIEEE Transactions on Neural Networks and Learning Systems (TNNLS), 2022
Sajjad Kachuee
M. Sharifkhani
590
1
0
10 Jan 2022
ThreshNet: An Efficient DenseNet Using Threshold Mechanism to Reduce
  Connections
ThreshNet: An Efficient DenseNet Using Threshold Mechanism to Reduce ConnectionsIEEE Access (IEEE Access), 2022
Ruikang Ju
Ting-Yu Lin
Jia-Hao Jian
Jen-Shiun Chiang
Weida Yang
260
9
0
09 Jan 2022
Fortunately, Discourse Markers Can Enhance Language Models for Sentiment
  Analysis
Fortunately, Discourse Markers Can Enhance Language Models for Sentiment AnalysisAAAI Conference on Artificial Intelligence (AAAI), 2022
L. Ein-Dor
Ilya Shnayderman
Artem Spector
Lena Dankin
R. Aharonov
Noam Slonim
213
9
0
06 Jan 2022
Which Student is Best? A Comprehensive Knowledge Distillation Exam for
  Task-Specific BERT Models
Which Student is Best? A Comprehensive Knowledge Distillation Exam for Task-Specific BERT Models
Made Nindyatama Nityasya
Haryo Akbarianto Wibowo
Rendi Chevi
Radityo Eko Prasojo
Alham Fikri Aji
181
7
0
03 Jan 2022
Automatic Mixed-Precision Quantization Search of BERT
Automatic Mixed-Precision Quantization Search of BERTInternational Joint Conference on Artificial Intelligence (IJCAI), 2021
Changsheng Zhao
Ting Hua
Yilin Shen
Qian Lou
Hongxia Jin
MQ
171
26
0
30 Dec 2021
An Efficient Combinatorial Optimization Model Using Learning-to-Rank
  Distillation
An Efficient Combinatorial Optimization Model Using Learning-to-Rank DistillationAAAI Conference on Artificial Intelligence (AAAI), 2021
Honguk Woo
Hyunsung Lee
Sangwook Cho
261
7
0
24 Dec 2021
ERNIE 3.0 Titan: Exploring Larger-scale Knowledge Enhanced Pre-training
  for Language Understanding and Generation
ERNIE 3.0 Titan: Exploring Larger-scale Knowledge Enhanced Pre-training for Language Understanding and Generation
Shuohuan Wang
Yu Sun
Yang Xiang
Zhihua Wu
Siyu Ding
...
Tian Wu
Wei Zeng
Ge Li
Wen Gao
Haifeng Wang
ELM
214
87
0
23 Dec 2021
Distilling the Knowledge of Romanian BERTs Using Multiple Teachers
Distilling the Knowledge of Romanian BERTs Using Multiple TeachersInternational Conference on Language Resources and Evaluation (LREC), 2021
Andrei-Marius Avram
Darius Catrina
Dumitru-Clementin Cercel
Mihai Dascualu
Traian Rebedea
Vasile Puaics
Dan Tufics
343
14
0
23 Dec 2021
Sublinear Time Approximation of Text Similarity Matrices
Sublinear Time Approximation of Text Similarity MatricesAAAI Conference on Artificial Intelligence (AAAI), 2021
Archan Ray
Nicholas Monath
Andrew McCallum
Cameron Musco
303
7
0
17 Dec 2021
Data Efficient Language-supervised Zero-shot Recognition with Optimal
  Transport Distillation
Data Efficient Language-supervised Zero-shot Recognition with Optimal Transport Distillation
Bichen Wu
Rui Cheng
Peizhao Zhang
Tianren Gao
Peter Vajda
Joseph E. Gonzalez
VLM
322
54
0
17 Dec 2021
Distilled Dual-Encoder Model for Vision-Language Understanding
Distilled Dual-Encoder Model for Vision-Language Understanding
Zekun Wang
Wenhui Wang
Haichao Zhu
Ming Liu
Bing Qin
Furu Wei
VLMFedML
214
35
0
16 Dec 2021
AdaViT: Adaptive Tokens for Efficient Vision Transformer
AdaViT: Adaptive Tokens for Efficient Vision Transformer
Hongxu Yin
Arash Vahdat
J. Álvarez
Arun Mallya
Jan Kautz
Pavlo Molchanov
ViT
647
449
0
14 Dec 2021
LMTurk: Few-Shot Learners as Crowdsourcing Workers in a
  Language-Model-as-a-Service Framework
LMTurk: Few-Shot Learners as Crowdsourcing Workers in a Language-Model-as-a-Service Framework
Mengjie Zhao
Fei Mi
Yasheng Wang
Minglei Li
Xin Jiang
Qun Liu
Hinrich Schütze
RALM
283
12
0
14 Dec 2021
Model Uncertainty-Aware Knowledge Amalgamation for Pre-Trained Language
  Models
Model Uncertainty-Aware Knowledge Amalgamation for Pre-Trained Language Models
Lei Li
Yankai Lin
Xuancheng Ren
Guangxiang Zhao
Peng Li
Jie Zhou
Xu Sun
MoMe
143
2
0
14 Dec 2021
From Dense to Sparse: Contrastive Pruning for Better Pre-trained
  Language Model Compression
From Dense to Sparse: Contrastive Pruning for Better Pre-trained Language Model Compression
Runxin Xu
Fuli Luo
Chengyu Wang
Baobao Chang
Yanjie Liang
Songfang Huang
Fei Huang
VLM
119
31
0
14 Dec 2021
On the Compression of Natural Language Models
On the Compression of Natural Language Models
S. Damadi
92
0
0
13 Dec 2021
Pruning Pretrained Encoders with a Multitask Objective
Pruning Pretrained Encoders with a Multitask Objective
Patrick Xia
Richard Shin
132
0
0
10 Dec 2021
DistilCSE: Effective Knowledge Distillation For Contrastive Sentence
  Embeddings
DistilCSE: Effective Knowledge Distillation For Contrastive Sentence Embeddings
Chaochen Gao
Xing Wu
Peng Wang
Jue Wang
Liangjun Zang
Zhongyuan Wang
Songlin Hu
174
5
0
10 Dec 2021
Previous
123...141516...202122
Next
Page 15 of 22
Pageof 22