DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter

2 October 2019

Papers citing "DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter"

31 / 131 papers shown

Title
AutoMix: Automatically Mixing Language Models Pranjal Aggarwal Aman Madaan Ankit Anand Srividya Pranavi Potharaju Swaroop Mishra ... Karthik Kappaganthu Yiming Yang Shyam Upadhyay Manaal Faruqui Mausam 67 20 0 19 Oct 2023
An Interpretable Deep-Learning Framework for Predicting Hospital Readmissions From Electronic Health Records Fabio Azzalini Tommaso Dolci Marco Vagaggini OOD 113 1 0 16 Oct 2023
Certifying LLM Safety against Adversarial Prompting Aounon Kumar Chirag Agarwal Suraj Srinivas Aaron Jiaxun Li Soheil Feizi Himabindu Lakkaraju AAML 47 182 0 06 Sep 2023
Evolution of ESG-focused DLT Research: An NLP Analysis of the Literature Walter Hernandez Cruz K. Tylinski Alastair Moore Niall Roche Nikhil Vadgama Horst Treiblmaier J. Shangguan Paolo Tasca Jiahua Xu 52 2 0 23 Aug 2023
UMLS-KGI-BERT: Data-Centric Knowledge Integration in Transformers for Biomedical Entity Recognition Aidan Mannion Thierry Chevalier D. Schwab Lorraine Goeuriot MedIm 62 3 0 20 Jul 2023
Generalizable Synthetic Image Detection via Language-guided Contrastive Learning Haiwei Wu Jiantao Zhou Shile Zhang 137 30 0 23 May 2023
Curating corpora with classifiers: A case study of clean energy sentiment online M. V. Arnold P. Dodds C. Danforth 51 0 0 04 May 2023
Towards a Decomposable Metric for Explainable Evaluation of Text Generation from AMR Juri Opitz Anette Frank 94 35 0 20 Aug 2020
Ensemble Distillation for Robust Model Fusion in Federated Learning Tao R. Lin Lingjing Kong Sebastian U. Stich Martin Jaggi FedML 47 1,026 0 12 Jun 2020
Efficient Intent Detection with Dual Sentence Encoders I. Casanueva Tadas Temvcinas D. Gerz Matthew Henderson Ivan Vulić VLM 271 463 0 10 Mar 2020
jiant: A Software Toolkit for Research on General-Purpose Text Understanding Models Yada Pruksachatkun Philip Yeres Haokun Liu Jason Phang Phu Mon Htut Alex Jinpeng Wang Ian Tenney Samuel R. Bowman SSeg 23 94 0 04 Mar 2020
TinyBERT: Distilling BERT for Natural Language Understanding Xiaoqi Jiao Yichun Yin Lifeng Shang Xin Jiang Xiao Chen Linlin Li F. Wang Qun Liu VLM 44 1,838 0 23 Sep 2019
Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism Mohammad Shoeybi M. Patwary Raul Puri P. LeGresley Jared Casper Bryan Catanzaro MoE 281 1,861 0 17 Sep 2019
Small and Practical BERT Models for Sequence Labeling Henry Tsai Jason Riesa Melvin Johnson N. Arivazhagan Xin Li Amelia Archer VLM 22 121 0 31 Aug 2019
Well-Read Students Learn Better: On the Importance of Pre-training Compact Models Iulia Turc Ming-Wei Chang Kenton Lee Kristina Toutanova 44 224 0 23 Aug 2019
RoBERTa: A Robustly Optimized BERT Pretraining Approach Yinhan Liu Myle Ott Naman Goyal Jingfei Du Mandar Joshi Danqi Chen Omer Levy M. Lewis Luke Zettlemoyer Veselin Stoyanov AIMat 377 24,160 0 26 Jul 2019
XLNet: Generalized Autoregressive Pretraining for Language Understanding Zhilin Yang Zihang Dai Yiming Yang J. Carbonell Ruslan Salakhutdinov Quoc V. Le AI4CE 160 8,386 0 19 Jun 2019
Energy and Policy Considerations for Deep Learning in NLP Emma Strubell Ananya Ganesh Andrew McCallum 43 2,633 0 05 Jun 2019
Are Sixteen Heads Really Better than One? Paul Michel Omer Levy Graham Neubig MoE 64 1,049 0 25 May 2019
Model Compression with Multi-Task Knowledge Distillation for Web-scale Question Answering System Ze Yang Linjun Shou Ming Gong Wutao Lin Daxin Jiang KELM 18 20 0 21 Apr 2019
Making Neural Machine Reading Comprehension Faster Debajyoti Chatterjee AIMat 40 9 0 29 Mar 2019
Distilling Task-Specific Knowledge from BERT into Simple Neural Networks Raphael Tang Yao Lu Linqing Liu Lili Mou Olga Vechtomova Jimmy J. Lin 52 419 0 28 Mar 2019
Improved Knowledge Distillation via Teacher Assistant Seyed Iman Mirzadeh Mehrdad Farajtabar Ang Li Nir Levine Akihiro Matsukawa H. Ghasemzadeh 79 1,073 0 09 Feb 2019
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding Jacob Devlin Ming-Wei Chang Kenton Lee Kristina Toutanova VLM SSL SSeg 856 93,936 0 11 Oct 2018
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding Alex Jinpeng Wang Amanpreet Singh Julian Michael Felix Hill Omer Levy Samuel R. Bowman ELM 572 7,080 0 20 Apr 2018
Deep contextualized word representations Matthew E. Peters Mark Neumann Mohit Iyyer Matt Gardner Christopher Clark Kenton Lee Luke Zettlemoyer NAI 93 11,520 0 15 Feb 2018
Attention Is All You Need Ashish Vaswani Noam M. Shazeer Niki Parmar Jakob Uszkoreit Llion Jones Aidan Gomez Lukasz Kaiser Illia Polosukhin 3DV 351 129,831 0 12 Jun 2017
SQuAD: 100,000+ Questions for Machine Comprehension of Text Pranav Rajpurkar Jian Zhang Konstantin Lopyrev Percy Liang RALM 133 8,067 0 16 Jun 2016
Aligning Books and Movies: Towards Story-like Visual Explanations by Watching Movies and Reading Books Yukun Zhu Ryan Kiros R. Zemel Ruslan Salakhutdinov R. Urtasun Antonio Torralba Sanja Fidler 92 2,529 0 22 Jun 2015
Distilling the Knowledge in a Neural Network Geoffrey E. Hinton Oriol Vinyals J. Dean FedML 183 19,448 0 09 Mar 2015
Deep Learning with Limited Numerical Precision Suyog Gupta A. Agrawal K. Gopalakrishnan P. Narayanan HAI 109 2,041 0 09 Feb 2015