Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
1909.10351
Cited By
v1
v2
v3
v4
v5 (latest)
TinyBERT: Distilling BERT for Natural Language Understanding
Findings (Findings), 2019
23 September 2019
Xiaoqi Jiao
Yichun Yin
Lifeng Shang
Xin Jiang
Xiao Chen
Linlin Li
F. Wang
Qun Liu
VLM
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"TinyBERT: Distilling BERT for Natural Language Understanding"
50 / 1,056 papers shown
Learned Token Pruning for Transformers
Sehoon Kim
Sheng Shen
D. Thorsley
A. Gholami
Woosuk Kwon
Joseph Hassoun
Kurt Keutzer
356
194
0
02 Jul 2021
Knowledge Distillation for Quality Estimation
Amit Gajbhiye
M. Fomicheva
Fernando Alva-Manchego
Frédéric Blain
A. Obamuyide
Nikolaos Aletras
Lucia Specia
247
11
0
01 Jul 2021
Elbert: Fast Albert with Confidence-Window Based Early Exit
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2021
Keli Xie
Siyuan Lu
Meiqi Wang
Zhongfeng Wang
184
23
0
01 Jul 2021
On the Interaction of Belief Bias and Explanations
Findings (Findings), 2021
Ana Valeria González
Anna Rogers
Anders Søgaard
FAtt
225
20
0
29 Jun 2021
Adapt-and-Distill: Developing Small, Fast and Effective Pretrained Language Models for Domains
Yunzhi Yao
Shaohan Huang
Wenhui Wang
Li Dong
Furu Wei
VLM
ALM
232
55
0
25 Jun 2021
Data Augmentation for Opcode Sequence Based Malware Detection
Niall McLaughlin
Jesus Martinez del Rincon
128
10
0
22 Jun 2021
LV-BERT: Exploiting Layer Variety for BERT
Findings (Findings), 2021
Weihao Yu
Zihang Jiang
Fei Chen
Qibin Hou
Jiashi Feng
MQ
156
0
0
22 Jun 2021
Categorising Fine-to-Coarse Grained Misinformation: An Empirical Study of COVID-19 Infodemic
Recent Advances in Natural Language Processing (RANLP), 2021
Ye Jiang
Xingyi Song
Carolina Scarton
Ahmet Aker
Kalina Bontcheva
322
17
0
22 Jun 2021
Direction is what you need: Improving Word Embedding Compression in Large Language Models
Klaudia Bałazy
Mohammadreza Banaei
R. Lebret
Jacek Tabor
Karl Aberer
120
9
0
15 Jun 2021
Pre-Trained Models: Past, Present and Future
AI Open (AO), 2021
Xu Han
Zhengyan Zhang
Ning Ding
Yuxian Gu
Xiao Liu
...
Jie Tang
Ji-Rong Wen
Jinhui Yuan
Wayne Xin Zhao
Jun Zhu
AIFin
MQ
AI4MH
392
998
0
14 Jun 2021
Why Can You Lay Off Heads? Investigating How BERT Heads Transfer
Ting-Rui Chiang
Yun-Nung Chen
92
0
0
14 Jun 2021
HR-NAS: Searching Efficient High-Resolution Neural Architectures with Lightweight Transformers
Computer Vision and Pattern Recognition (CVPR), 2021
Mingyu Ding
Xiaochen Lian
Linjie Yang
Peng Wang
Xiaojie Jin
Zhiwu Lu
Ping Luo
ViT
242
74
0
11 Jun 2021
Generate, Annotate, and Learn: NLP with Synthetic Text
Transactions of the Association for Computational Linguistics (TACL), 2021
Xuanli He
Islam Nassar
J. Kiros
Gholamreza Haffari
Mohammad Norouzi
326
66
0
11 Jun 2021
RefBERT: Compressing BERT by Referencing to Pre-computed Representations
IEEE International Joint Conference on Neural Network (IJCNN), 2021
Xinyi Wang
Haiqing Yang
Liang Zhao
Yang Mo
Jianping Shen
MQ
168
4
0
11 Jun 2021
Marginal Utility Diminishes: Exploring the Minimum Knowledge for BERT Knowledge Distillation
Annual Meeting of the Association for Computational Linguistics (ACL), 2021
Yuanxin Liu
Fandong Meng
Zheng Lin
Weiping Wang
Jie Zhou
78
6
0
10 Jun 2021
AUGNLG: Few-shot Natural Language Generation using Self-trained Data Augmentation
Annual Meeting of the Association for Computational Linguistics (ACL), 2021
Xinnuo Xu
Guoyin Wang
Young-Bum Kim
Sungjin Lee
156
33
0
10 Jun 2021
BERT Learns to Teach: Knowledge Distillation with Meta Learning
Annual Meeting of the Association for Computational Linguistics (ACL), 2021
Wangchunshu Zhou
Canwen Xu
Julian McAuley
321
107
0
08 Jun 2021
XtremeDistilTransformers: Task Transfer for Task-agnostic Distillation
Subhabrata Mukherjee
Ahmed Hassan Awadallah
Jianfeng Gao
241
23
0
08 Jun 2021
Multi-hop Graph Convolutional Network with High-order Chebyshev Approximation for Text Reasoning
Annual Meeting of the Association for Computational Linguistics (ACL), 2021
Shuoran Jiang
Qingcai Chen
Xin Liu
Baotian Hu
Lisai Zhang
124
3
0
08 Jun 2021
RoSearch: Search for Robust Student Architectures When Distilling Pre-trained Language Models
Xin Guo
Jianlei Yang
Haoyi Zhou
Xucheng Ye
Jianxin Li
114
2
0
07 Jun 2021
You Only Compress Once: Towards Effective and Elastic BERT Compression via Exploit-Explore Stochastic Nature Gradient
Shaokun Zhang
Xiawu Zheng
Chenyi Yang
Yuchao Li
Yan Wang
Jiayi Ji
Mengdi Wang
Shen Li
Jun Yang
Rongrong Ji
MQ
178
23
0
04 Jun 2021
ERNIE-Tiny : A Progressive Distillation Framework for Pretrained Transformer Compression
Weiyue Su
Xuyi Chen
Shi Feng
Jiaxiang Liu
Weixin Liu
Yu Sun
Hao Tian
Hua Wu
Haifeng Wang
194
13
0
04 Jun 2021
Enabling Lightweight Fine-tuning for Pre-trained Language Model Compression based on Matrix Product Operators
Annual Meeting of the Association for Computational Linguistics (ACL), 2021
Peiyu Liu
Ze-Feng Gao
Wayne Xin Zhao
Z. Xie
Zhong-Yi Lu
Ji-Rong Wen
108
33
0
04 Jun 2021
DynamicViT: Efficient Vision Transformers with Dynamic Token Sparsification
Neural Information Processing Systems (NeurIPS), 2021
Yongming Rao
Wenliang Zhao
Benlin Liu
Jiwen Lu
Jie Zhou
Cho-Jui Hsieh
ViT
529
932
0
03 Jun 2021
One Teacher is Enough? Pre-trained Language Model Distillation from Multiple Teachers
Findings (Findings), 2021
Chuhan Wu
Fangzhao Wu
Yongfeng Huang
182
71
0
02 Jun 2021
Towards Quantifiable Dialogue Coherence Evaluation
Annual Meeting of the Association for Computational Linguistics (ACL), 2021
Zheng Ye
Liucun Lu
Lishan Huang
Liang Lin
Xiaodan Liang
148
33
0
01 Jun 2021
DoT: An efficient Double Transformer for NLP tasks with tables
Findings (Findings), 2021
Syrine Krichene
Thomas Müller
Julian Martin Eisenschlos
203
16
0
01 Jun 2021
Distribution Matching for Rationalization
AAAI Conference on Artificial Intelligence (AAAI), 2021
Yongfeng Huang
Yujun Chen
Yulun Du
Zhilin Yang
OOD
172
21
0
01 Jun 2021
Connecting Language and Vision for Natural Language-Based Vehicle Retrieval
Shuai Bai
Zhedong Zheng
Xiaohan Wang
Junyang Lin
Zhu Zhang
Chang Zhou
Yi Yang
Hongxia Yang
240
30
0
31 May 2021
Greedy-layer Pruning: Speeding up Transformer Models for Natural Language Processing
Pattern Recognition Letters (PR), 2021
David Peer
Sebastian Stabinger
Stefan Engl
A. Rodríguez-Sánchez
197
31
0
31 May 2021
LEAP: Learnable Pruning for Transformer-based Models
Z. Yao
Xiaoxia Wu
Linjian Ma
Sheng Shen
Kurt Keutzer
Michael W. Mahoney
Yuxiong He
214
8
0
30 May 2021
NAS-BERT: Task-Agnostic and Adaptive-Size BERT Compression with Neural Architecture Search
Knowledge Discovery and Data Mining (KDD), 2021
Jin Xu
Xu Tan
Renqian Luo
Kaitao Song
Jian Li
Tao Qin
Tie-Yan Liu
MQ
152
90
0
30 May 2021
Knowledge Inheritance for Pre-trained Language Models
North American Chapter of the Association for Computational Linguistics (NAACL), 2021
Yujia Qin
Yankai Lin
Jing Yi
Jiajie Zhang
Xu Han
...
Yusheng Su
Zhiyuan Liu
Peng Li
Maosong Sun
Jie Zhou
VLM
240
56
0
28 May 2021
Accelerating BERT Inference for Sequence Labeling via Early-Exit
Annual Meeting of the Association for Computational Linguistics (ACL), 2021
Xiaonan Li
Yunfan Shao
Tianxiang Sun
Hang Yan
Xipeng Qiu
Xuanjing Huang
279
43
0
28 May 2021
Lightweight Cross-Lingual Sentence Representation Learning
Annual Meeting of the Association for Computational Linguistics (ACL), 2021
Zhuoyuan Mao
Prakhar Gupta
Pei Wang
Chenhui Chu
Martin Jaggi
Sadao Kurohashi
VLM
334
9
0
28 May 2021
Early Exiting with Ensemble Internal Classifiers
Tianxiang Sun
Yunhua Zhou
Xiangyang Liu
Xinyu Zhang
Hao Jiang
Bo Zhao
Xuanjing Huang
Xipeng Qiu
157
35
0
28 May 2021
Not Far Away, Not So Close: Sample Efficient Nearest Neighbour Data Augmentation via MiniMax
Findings (Findings), 2021
Ehsan Kamalloo
Mehdi Rezagholizadeh
Peyman Passban
Ali Ghodsi
AAML
200
17
0
28 May 2021
Selective Knowledge Distillation for Neural Machine Translation
Annual Meeting of the Association for Computational Linguistics (ACL), 2021
Fusheng Wang
Jianhao Yan
Fandong Meng
Jie Zhou
203
67
0
27 May 2021
TR-BERT: Dynamic Token Reduction for Accelerating BERT Inference
North American Chapter of the Association for Computational Linguistics (NAACL), 2021
Deming Ye
Yankai Lin
Yufei Huang
Maosong Sun
MQ
207
74
0
25 May 2021
Intra-Document Cascading: Learning to Select Passages for Neural Document Ranking
Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR), 2021
Sebastian Hofstatter
Bhaskar Mitra
Hamed Zamani
Nick Craswell
Allan Hanbury
183
47
0
20 May 2021
BERT Busters: Outlier Dimensions that Disrupt Transformers
Findings (Findings), 2021
Olga Kovaleva
Saurabh Kulshreshtha
Anna Rogers
Anna Rumshisky
453
112
0
14 May 2021
Retrieval-Free Knowledge-Grounded Dialogue Response Generation with Adapters
Workshop on Document-grounded Dialogue and Conversational Question Answering (DialDoc), 2021
Yan Xu
Etsuko Ishii
Samuel Cahyawijaya
Zihan Liu
Genta Indra Winata
Andrea Madotto
Jane Polak Scowcroft
Pascale Fung
RALM
183
47
0
13 May 2021
MATE-KD: Masked Adversarial TExt, a Companion to Knowledge Distillation
Annual Meeting of the Association for Computational Linguistics (ACL), 2021
Ahmad Rashid
Vasileios Lioutas
Mehdi Rezagholizadeh
AAML
248
40
0
12 May 2021
FNet: Mixing Tokens with Fourier Transforms
North American Chapter of the Association for Computational Linguistics (NAACL), 2021
James Lee-Thorp
Joshua Ainslie
Ilya Eckstein
Santiago Ontanon
658
645
0
09 May 2021
Easy and Efficient Transformer : Scalable Inference Solution For large NLP model
North American Chapter of the Association for Computational Linguistics (NAACL), 2021
GongZheng Li
Yadong Xi
Jingzhen Ding
Duan Wang
Bai Liu
Changjie Fan
Xiaoxi Mao
Zeng Zhao
270
11
0
26 Apr 2021
Extract then Distill: Efficient and Effective Task-Agnostic BERT Distillation
International Conference on Artificial Neural Networks (ICANN), 2021
Cheng Chen
Yichun Yin
Lifeng Shang
Zhi Wang
Xin Jiang
Xiao Chen
Qun Liu
FedML
149
9
0
24 Apr 2021
Disfluency Detection with Unlabeled Data and Small BERT Models
Interspeech (Interspeech), 2021
Johann C. Rocholl
Vicky Zayats
D. D. Walker
Noah B. Murad
Aaron Schneider
Daniel J. Liebling
175
33
0
21 Apr 2021
Review of end-to-end speech synthesis technology based on deep learning
Zhaoxi Mu
Xinyu Yang
Yizhuo Dong
AuLLM
ALM
216
31
0
20 Apr 2021
Knowledge Distillation as Semiparametric Inference
International Conference on Learning Representations (ICLR), 2021
Tri Dao
G. Kamath
Vasilis Syrgkanis
Lester W. Mackey
233
37
0
20 Apr 2021
Rethinking Network Pruning -- under the Pre-train and Fine-tune Paradigm
North American Chapter of the Association for Computational Linguistics (NAACL), 2021
Dongkuan Xu
Ian En-Hsu Yen
Jinxi Zhao
Zhibin Xiao
VLM
AAML
193
66
0
18 Apr 2021
Previous
1
2
3
...
17
18
19
20
21
22
Next
Page 18 of 22
Page
of 22
Go