Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1910.01108
Cited By
DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter
2 October 2019
Victor Sanh
Lysandre Debut
Julien Chaumond
Thomas Wolf
Re-assign community
ArXiv
PDF
HTML
Papers citing
"DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter"
31 / 131 papers shown
Title
AutoMix: Automatically Mixing Language Models
Pranjal Aggarwal
Aman Madaan
Ankit Anand
Srividya Pranavi Potharaju
Swaroop Mishra
...
Karthik Kappaganthu
Yiming Yang
Shyam Upadhyay
Manaal Faruqui
Mausam
67
20
0
19 Oct 2023
An Interpretable Deep-Learning Framework for Predicting Hospital Readmissions From Electronic Health Records
Fabio Azzalini
Tommaso Dolci
Marco Vagaggini
OOD
113
1
0
16 Oct 2023
Certifying LLM Safety against Adversarial Prompting
Aounon Kumar
Chirag Agarwal
Suraj Srinivas
Aaron Jiaxun Li
Soheil Feizi
Himabindu Lakkaraju
AAML
47
182
0
06 Sep 2023
Evolution of ESG-focused DLT Research: An NLP Analysis of the Literature
Walter Hernandez Cruz
K. Tylinski
Alastair Moore
Niall Roche
Nikhil Vadgama
Horst Treiblmaier
J. Shangguan
Paolo Tasca
Jiahua Xu
52
2
0
23 Aug 2023
UMLS-KGI-BERT: Data-Centric Knowledge Integration in Transformers for Biomedical Entity Recognition
Aidan Mannion
Thierry Chevalier
D. Schwab
Lorraine Goeuriot
MedIm
62
3
0
20 Jul 2023
Generalizable Synthetic Image Detection via Language-guided Contrastive Learning
Haiwei Wu
Jiantao Zhou
Shile Zhang
137
30
0
23 May 2023
Curating corpora with classifiers: A case study of clean energy sentiment online
M. V. Arnold
P. Dodds
C. Danforth
51
0
0
04 May 2023
Towards a Decomposable Metric for Explainable Evaluation of Text Generation from AMR
Juri Opitz
Anette Frank
94
35
0
20 Aug 2020
Ensemble Distillation for Robust Model Fusion in Federated Learning
Tao R. Lin
Lingjing Kong
Sebastian U. Stich
Martin Jaggi
FedML
47
1,026
0
12 Jun 2020
Efficient Intent Detection with Dual Sentence Encoders
I. Casanueva
Tadas Temvcinas
D. Gerz
Matthew Henderson
Ivan Vulić
VLM
271
463
0
10 Mar 2020
jiant: A Software Toolkit for Research on General-Purpose Text Understanding Models
Yada Pruksachatkun
Philip Yeres
Haokun Liu
Jason Phang
Phu Mon Htut
Alex Jinpeng Wang
Ian Tenney
Samuel R. Bowman
SSeg
23
94
0
04 Mar 2020
TinyBERT: Distilling BERT for Natural Language Understanding
Xiaoqi Jiao
Yichun Yin
Lifeng Shang
Xin Jiang
Xiao Chen
Linlin Li
F. Wang
Qun Liu
VLM
44
1,838
0
23 Sep 2019
Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism
Mohammad Shoeybi
M. Patwary
Raul Puri
P. LeGresley
Jared Casper
Bryan Catanzaro
MoE
281
1,861
0
17 Sep 2019
Small and Practical BERT Models for Sequence Labeling
Henry Tsai
Jason Riesa
Melvin Johnson
N. Arivazhagan
Xin Li
Amelia Archer
VLM
22
121
0
31 Aug 2019
Well-Read Students Learn Better: On the Importance of Pre-training Compact Models
Iulia Turc
Ming-Wei Chang
Kenton Lee
Kristina Toutanova
44
224
0
23 Aug 2019
RoBERTa: A Robustly Optimized BERT Pretraining Approach
Yinhan Liu
Myle Ott
Naman Goyal
Jingfei Du
Mandar Joshi
Danqi Chen
Omer Levy
M. Lewis
Luke Zettlemoyer
Veselin Stoyanov
AIMat
377
24,160
0
26 Jul 2019
XLNet: Generalized Autoregressive Pretraining for Language Understanding
Zhilin Yang
Zihang Dai
Yiming Yang
J. Carbonell
Ruslan Salakhutdinov
Quoc V. Le
AI4CE
160
8,386
0
19 Jun 2019
Energy and Policy Considerations for Deep Learning in NLP
Emma Strubell
Ananya Ganesh
Andrew McCallum
43
2,633
0
05 Jun 2019
Are Sixteen Heads Really Better than One?
Paul Michel
Omer Levy
Graham Neubig
MoE
64
1,049
0
25 May 2019
Model Compression with Multi-Task Knowledge Distillation for Web-scale Question Answering System
Ze Yang
Linjun Shou
Ming Gong
Wutao Lin
Daxin Jiang
KELM
18
20
0
21 Apr 2019
Making Neural Machine Reading Comprehension Faster
Debajyoti Chatterjee
AIMat
40
9
0
29 Mar 2019
Distilling Task-Specific Knowledge from BERT into Simple Neural Networks
Raphael Tang
Yao Lu
Linqing Liu
Lili Mou
Olga Vechtomova
Jimmy J. Lin
52
419
0
28 Mar 2019
Improved Knowledge Distillation via Teacher Assistant
Seyed Iman Mirzadeh
Mehrdad Farajtabar
Ang Li
Nir Levine
Akihiro Matsukawa
H. Ghasemzadeh
79
1,073
0
09 Feb 2019
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Jacob Devlin
Ming-Wei Chang
Kenton Lee
Kristina Toutanova
VLM
SSL
SSeg
856
93,936
0
11 Oct 2018
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding
Alex Jinpeng Wang
Amanpreet Singh
Julian Michael
Felix Hill
Omer Levy
Samuel R. Bowman
ELM
572
7,080
0
20 Apr 2018
Deep contextualized word representations
Matthew E. Peters
Mark Neumann
Mohit Iyyer
Matt Gardner
Christopher Clark
Kenton Lee
Luke Zettlemoyer
NAI
93
11,520
0
15 Feb 2018
Attention Is All You Need
Ashish Vaswani
Noam M. Shazeer
Niki Parmar
Jakob Uszkoreit
Llion Jones
Aidan Gomez
Lukasz Kaiser
Illia Polosukhin
3DV
351
129,831
0
12 Jun 2017
SQuAD: 100,000+ Questions for Machine Comprehension of Text
Pranav Rajpurkar
Jian Zhang
Konstantin Lopyrev
Percy Liang
RALM
133
8,067
0
16 Jun 2016
Aligning Books and Movies: Towards Story-like Visual Explanations by Watching Movies and Reading Books
Yukun Zhu
Ryan Kiros
R. Zemel
Ruslan Salakhutdinov
R. Urtasun
Antonio Torralba
Sanja Fidler
92
2,529
0
22 Jun 2015
Distilling the Knowledge in a Neural Network
Geoffrey E. Hinton
Oriol Vinyals
J. Dean
FedML
183
19,448
0
09 Mar 2015
Deep Learning with Limited Numerical Precision
Suyog Gupta
A. Agrawal
K. Gopalakrishnan
P. Narayanan
HAI
109
2,041
0
09 Feb 2015
Previous
1
2
3