v1v2v3v4v5 (latest)

TinyBERT: Distilling BERT for Natural Language Understanding

Findings (Findings), 2019

23 September 2019

Xiaoqi Jiao

Yichun Yin

Lifeng Shang

Xin Jiang

Linlin Li

Qun Liu

Papers citing "TinyBERT: Distilling BERT for Natural Language Understanding"

50 / 1,055 papers shown

Title
Question and Answer Test-Train Overlap in Open-Domain Question Answering DatasetsConference of the European Chapter of the Association for Computational Linguistics (EACL), 2020 Patrick Lewis Pontus Stenetorp Sebastian Riedel OOD ELM 235 196 0 06 Aug 2020
ConvBERT: Improving BERT with Span-based Dynamic ConvolutionNeural Information Processing Systems (NeurIPS), 2020 Zihang Jiang Weihao Yu Daquan Zhou Yunpeng Chen Jiashi Feng Shuicheng Yan 318 195 0 06 Aug 2020
Understanding BERT Rankers Under Distillation Luyu Gao Zhuyun Dai Jamie Callan 144 55 0 21 Jul 2020
SqueezeBERT: What can computer vision teach NLP about efficient neural networks? F. Iandola Albert Eaton Shaw Ravi Krishna Kurt Keutzer VLM 230 136 0 19 Jun 2020
Knowledge Distillation: A Survey Jianping Gou B. Yu Stephen J. Maybank Dacheng Tao VLM 1.6K 3,631 0 09 Jun 2020
BERT Loses Patience: Fast and Robust Inference with Early Exit Wangchunshu Zhou Canwen Xu Tao Ge Julian McAuley Ke Xu Furu Wei 294 394 0 07 Jun 2020
Accelerating Natural Language Understanding in Task-Oriented Dialog Ojas Ahuja Shrey Desai VLM 115 1 0 05 Jun 2020
An Overview of Neural Network Compression James OÑeill AI4CE 292 112 0 05 Jun 2020
Language Models are Few-Shot LearnersNeural Information Processing Systems (NeurIPS), 2020 Tom B. Brown Benjamin Mann Nick Ryder Melanie Subbiah Jared Kaplan ... Christopher Berner Sam McCandlish Alec Radford Ilya Sutskever Dario Amodei BDL 2.0K 51,554 0 28 May 2020
Adversarial NLI for Factual Correctness in Text Summarisation Models Mario Barrantes Benedikt Herudek Richard Wang 95 18 0 24 May 2020
FashionBERT: Text and Image Matching with Adaptive Loss for Cross-modal Retrieval D. Gao Linbo Jin Ben Chen Minghui Qiu Peng Li Yi Wei Yitao Hu Haozhe Jasper Wang OOD 193 146 0 20 May 2020
Distilling Knowledge from Ensembles of Acoustic Models for Joint CTC-Attention End-to-End Speech Recognition Yan Gao Titouan Parcollet Nicholas D. Lane VLM 166 15 0 19 May 2020
Speech to Text Adaptation: Towards an Efficient Cross-Modal Distillation Won Ik Cho Donghyun Kwak J. Yoon N. Kim 229 27 0 17 May 2020
Movement Pruning: Adaptive Sparsity by Fine-Tuning Victor Sanh Thomas Wolf Alexander M. Rush 295 549 0 15 May 2020
Machine Reading Comprehension: The Role of Contextualized Language Models and Beyond Zhuosheng Zhang Hai Zhao Rui Wang 192 66 0 13 May 2020
Distilling Knowledge from Pre-trained Language Models via Text Smoothing Xing Wu Zichen Liu Xiangyang Zhou Dianhai Yu 124 6 0 08 May 2020
GOBO: Quantizing Attention-Based NLP Models for Low Latency and Energy Efficient Inference Ali Hadi Zadeh Isak Edo Omar Mohamed Awad Andreas Moshovos MQ 258 208 0 08 May 2020
DeFormer: Decomposing Pre-trained Transformers for Faster Question AnsweringAnnual Meeting of the Association for Computational Linguistics (ACL), 2020 Qingqing Cao H. Trivedi A. Balasubramanian Niranjan Balasubramanian 171 70 0 02 May 2020
When BERT Plays the Lottery, All Tickets Are WinningConference on Empirical Methods in Natural Language Processing (EMNLP), 2020 Sai Prasanna Anna Rogers Anna Rumshisky MILM 256 199 0 01 May 2020
HERO: Hierarchical Encoder for Video+Language Omni-representation Pre-trainingConference on Empirical Methods in Natural Language Processing (EMNLP), 2020 Linjie Li Yen-Chun Chen Yu Cheng Zhe Gan Licheng Yu Jingjing Liu MLLM VLM OffRL AI4TS 613 536 0 01 May 2020
DeeBERT: Dynamic Early Exiting for Accelerating BERT InferenceAnnual Meeting of the Association for Computational Linguistics (ACL), 2020 Ji Xin Raphael Tang Jaejun Lee Yaoliang Yu Jimmy J. Lin 197 430 0 27 Apr 2020
ColBERT: Efficient and Effective Passage Search via Contextualized Late Interaction over BERTAnnual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR), 2020 Omar Khattab Matei A. Zaharia 357 1,714 0 27 Apr 2020
Probabilistically Masked Language Model Capable of Autoregressive Generation in Arbitrary Word Order Yi-Lun Liao Xin Jiang Qun Liu 104 41 0 24 Apr 2020
The Cost of Training NLP Models: A Concise Overview Or Sharir Barak Peleg Y. Shoham 203 229 0 19 Apr 2020
The Right Tool for the Job: Matching Model and Instance ComplexitiesAnnual Meeting of the Association for Computational Linguistics (ACL), 2020 Roy Schwartz Gabriel Stanovsky Swabha Swayamdipta Jesse Dodge Noah A. Smith 297 177 0 16 Apr 2020
Training with Quantization Noise for Extreme Model CompressionInternational Conference on Learning Representations (ICLR), 2020 Angela Fan Pierre Stock Benjamin Graham Edouard Grave Remi Gribonval Edouard Grave Armand Joulin MQ 239 256 0 15 Apr 2020
XtremeDistil: Multi-stage Distillation for Massive Multilingual ModelsAnnual Meeting of the Association for Computational Linguistics (ACL), 2020 Subhabrata Mukherjee Ahmed Hassan Awadallah 191 62 0 12 Apr 2020
LadaBERT: Lightweight Adaptation of BERT through Hybrid Model CompressionInternational Conference on Computational Linguistics (COLING), 2020 Yihuan Mao Yujing Wang Chufan Wu Chen Zhang Yang-Feng Wang Yaming Yang Quanlu Zhang Yunhai Tong Jing Bai 139 80 0 08 Apr 2020
DynaBERT: Dynamic BERT with Adaptive Width and DepthNeural Information Processing Systems (NeurIPS), 2020 Lu Hou Zhiqi Huang Lifeng Shang Xin Jiang Xiao Chen Qun Liu MQ 242 352 0 08 Apr 2020
Structure-Level Knowledge Distillation For Multilingual Sequence LabelingAnnual Meeting of the Association for Computational Linguistics (ACL), 2020 Xinyu Wang Yong Jiang Nguyen Bach Tao Wang Fei Huang Kewei Tu 232 38 0 08 Apr 2020
On the Effect of Dropping Layers of Pre-trained Transformer ModelsComputer Speech and Language (CSL), 2020 Hassan Sajjad Fahim Dalvi Nadir Durrani Preslav Nakov 259 172 0 08 Apr 2020
Towards Non-task-specific Distillation of BERT via Sentence Representation Approximation Bowen Wu Huan Zhang Mengyuan Li Zongsheng Wang Qihang Feng Junhong Huang Baoxun Wang 133 4 0 07 Apr 2020
MobileBERT: a Compact Task-Agnostic BERT for Resource-Limited DevicesAnnual Meeting of the Association for Computational Linguistics (ACL), 2020 Zhiqing Sun Hongkun Yu Xiaodan Song Renjie Liu Yiming Yang Denny Zhou MQ 359 916 0 06 Apr 2020
FastBERT: a Self-distilling BERT with Adaptive Inference TimeAnnual Meeting of the Association for Computational Linguistics (ACL), 2020 Weijie Liu Peng Zhou Zhe Zhao Zhiruo Wang Haotang Deng Qi Ju 228 392 0 05 Apr 2020
Pre-trained Models for Natural Language Processing: A SurveyScience China Technological Sciences (Sci China Technol Sci), 2020 Xipeng Qiu Tianxiang Sun Yige Xu Yunfan Shao Ning Dai Xuanjing Huang LM&MA VLM 933 1,607 0 18 Mar 2020
A Survey on Contextual Embeddings Qi Liu Matt J. Kusner Phil Blunsom 433 169 0 16 Mar 2020
TextBrewer: An Open-Source Knowledge Distillation Toolkit for Natural Language ProcessingAnnual Meeting of the Association for Computational Linguistics (ACL), 2020 Ziqing Yang Yiming Cui Zhipeng Chen Wanxiang Che Ting Liu Shijin Wang Guoping Hu VLM 179 50 0 28 Feb 2020
A Primer in BERTology: What we know about how BERT worksTransactions of the Association for Computational Linguistics (TACL), 2020 Anna Rogers Olga Kovaleva Anna Rumshisky OffRL 393 1,690 0 27 Feb 2020
Compressing Large-Scale Transformer-Based Models: A Case Study on BERTTransactions of the Association for Computational Linguistics (TACL), 2020 Prakhar Ganesh Yao Chen Xin Lou Mohammad Ali Khan Yifan Yang Hassan Sajjad Preslav Nakov Deming Chen Marianne Winslett AI4CE 393 213 0 27 Feb 2020
MiniLM: Deep Self-Attention Distillation for Task-Agnostic Compression of Pre-Trained TransformersNeural Information Processing Systems (NeurIPS), 2020 Wenhui Wang Furu Wei Li Dong Hangbo Bao Nan Yang Ming Zhou VLM 833 1,707 0 25 Feb 2020
Improving BERT Fine-Tuning via Self-Ensemble and Self-DistillationJournal of Computational Science and Technology (JCST), 2020 Yige Xu Xipeng Qiu L. Zhou Xuanjing Huang 135 73 0 24 Feb 2020
TwinBERT: Distilling Knowledge to Twin-Structured BERT Models for Efficient Retrieval Wenhao Lu Jian Jiao Ruofei Zhang 171 52 0 14 Feb 2020
BERT-of-Theseus: Compressing BERT by Progressive Module ReplacingConference on Empirical Methods in Natural Language Processing (EMNLP), 2020 Canwen Xu Wangchunshu Zhou Tao Ge Furu Wei Ming Zhou 644 216 0 07 Feb 2020
Aligning the Pretraining and Finetuning Objectives of Language Models Nuo Wang Pierse Jing Lu AI4CE 95 2 0 05 Feb 2020
AdaBERT: Task-Adaptive BERT Compression with Differentiable Neural Architecture SearchInternational Joint Conference on Artificial Intelligence (IJCAI), 2020 Daoyuan Chen Yaliang Li Minghui Qiu Zhen Wang Bofang Li Bolin Ding Hongbo Deng Yanjie Liang Jialin Li Jingren Zhou MQ 209 106 0 13 Jan 2020
The State of Knowledge Distillation for Classification Fabian Ruffy K. Chahal 162 21 0 20 Dec 2019
WaLDORf: Wasteless Language-model Distillation On Reading-comprehension J. Tian A. Kreuzer Pai-Hung Chen Hans-Martin Will VLM 162 3 0 13 Dec 2019
Unsupervised Pre-training for Natural Language Generation: A Literature Review Yuanxin Liu Zheng Lin SSL AI4CE 110 5 0 13 Nov 2019
MKD: a Multi-Task Knowledge Distillation Approach for Pretrained Language Models Linqing Liu Haiquan Wang Jimmy J. Lin R. Socher Caiming Xiong 178 23 0 09 Nov 2019
Blockwise Self-Attention for Long Document UnderstandingFindings (Findings), 2019 J. Qiu Hao Ma Omer Levy Scott Yih Sinong Wang Jie Tang 272 269 0 07 Nov 2019