Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1908.08962
Cited By
Well-Read Students Learn Better: On the Importance of Pre-training Compact Models
23 August 2019
Iulia Turc
Ming-Wei Chang
Kenton Lee
Kristina Toutanova
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Well-Read Students Learn Better: On the Importance of Pre-training Compact Models"
30 / 30 papers shown
Title
To Judge or not to Judge: Using LLM Judgements for Advertiser Keyphrase Relevance at eBay
Soumik Dey
Hansi Wu
Binbin Li
68
1
0
07 May 2025
TRA: Better Length Generalisation with Threshold Relative Attention
Mattia Opper
Roland Fernandez
P. Smolensky
Jianfeng Gao
75
0
0
29 Mar 2025
Banyan: Improved Representation Learning with Explicit Structure
Mattia Opper
N. Siddharth
90
1
0
25 Jul 2024
Transformer to CNN: Label-scarce distillation for efficient text classification
Yew Ken Chia
Sam Witteveen
Martin Andrews
29
37
0
08 Sep 2019
ERNIE 2.0: A Continual Pre-training Framework for Language Understanding
Yu Sun
Shuohuan Wang
Yukun Li
Shikun Feng
Hao Tian
Hua Wu
Haifeng Wang
CLL
75
804
0
29 Jul 2019
RoBERTa: A Robustly Optimized BERT Pretraining Approach
Yinhan Liu
Myle Ott
Naman Goyal
Jingfei Du
Mandar Joshi
Danqi Chen
Omer Levy
M. Lewis
Luke Zettlemoyer
Veselin Stoyanov
AIMat
408
24,160
0
26 Jul 2019
BAM! Born-Again Multi-Task Networks for Natural Language Understanding
Kevin Clark
Minh-Thang Luong
Urvashi Khandelwal
Christopher D. Manning
Quoc V. Le
49
229
0
10 Jul 2019
XLNet: Generalized Autoregressive Pretraining for Language Understanding
Zhilin Yang
Zihang Dai
Yiming Yang
J. Carbonell
Ruslan Salakhutdinov
Quoc V. Le
AI4CE
183
8,386
0
19 Jun 2019
Variational Pretraining for Semi-supervised Text Classification
Suchin Gururangan
T. Dang
Dallas Card
Noah A. Smith
VLM
35
111
0
05 Jun 2019
Model Compression with Multi-Task Knowledge Distillation for Web-scale Question Answering System
Ze Yang
Linjun Shou
Ming Gong
Wutao Lin
Daxin Jiang
KELM
18
20
0
21 Apr 2019
Distilling Task-Specific Knowledge from BERT into Simple Neural Networks
Raphael Tang
Yao Lu
Linqing Liu
Lili Mou
Olga Vechtomova
Jimmy J. Lin
54
419
0
28 Mar 2019
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Jacob Devlin
Ming-Wei Chang
Kenton Lee
Kristina Toutanova
VLM
SSL
SSeg
964
93,936
0
11 Oct 2018
Attention-Guided Answer Distillation for Machine Reading Comprehension
Minghao Hu
Yuxing Peng
Furu Wei
Zhen Huang
Dongsheng Li
Nan Yang
M. Zhou
FaML
48
75
0
23 Aug 2018
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding
Alex Jinpeng Wang
Amanpreet Singh
Julian Michael
Felix Hill
Omer Levy
Samuel R. Bowman
ELM
658
7,080
0
20 Apr 2018
Annotation Artifacts in Natural Language Inference Data
Suchin Gururangan
Swabha Swayamdipta
Omer Levy
Roy Schwartz
Samuel R. Bowman
Noah A. Smith
106
1,167
0
06 Mar 2018
Deep contextualized word representations
Matthew E. Peters
Mark Neumann
Mohit Iyyer
Matt Gardner
Christopher Clark
Kenton Lee
Luke Zettlemoyer
NAI
111
11,520
0
15 Feb 2018
Few-shot learning of neural networks from scratch by pseudo example optimization
Akisato Kimura
Zoubin Ghahramani
Koh Takeuchi
Tomoharu Iwata
N. Ueda
48
52
0
08 Feb 2018
Attention Is All You Need
Ashish Vaswani
Noam M. Shazeer
Niki Parmar
Jakob Uszkoreit
Llion Jones
Aidan Gomez
Lukasz Kaiser
Illia Polosukhin
3DV
453
129,831
0
12 Jun 2017
A Broad-Coverage Challenge Corpus for Sentence Understanding through Inference
Adina Williams
Nikita Nangia
Samuel R. Bowman
405
4,444
0
18 Apr 2017
Sequence-Level Knowledge Distillation
Yoon Kim
Alexander M. Rush
84
1,109
0
25 Jun 2016
Ups and Downs: Modeling the Visual Evolution of Fashion Trends with One-Class Collaborative Filtering
Ruining He
Julian McAuley
98
2,048
0
04 Feb 2016
Semi-supervised Sequence Learning
Andrew M. Dai
Quoc V. Le
SSL
104
1,232
0
04 Nov 2015
Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding
Song Han
Huizi Mao
W. Dally
3DGS
203
8,793
0
01 Oct 2015
A large annotated corpus for learning natural language inference
Samuel R. Bowman
Gabor Angeli
Christopher Potts
Christopher D. Manning
229
4,268
0
21 Aug 2015
Aligning Books and Movies: Towards Story-like Visual Explanations by Watching Movies and Reading Books
Yukun Zhu
Ryan Kiros
R. Zemel
Ruslan Salakhutdinov
R. Urtasun
Antonio Torralba
Sanja Fidler
105
2,529
0
22 Jun 2015
Distilling the Knowledge in a Neural Network
Geoffrey E. Hinton
Oriol Vinyals
J. Dean
FedML
241
19,523
0
09 Mar 2015
Deep Learning with Limited Numerical Precision
Suyog Gupta
A. Agrawal
K. Gopalakrishnan
P. Narayanan
HAI
134
2,043
0
09 Feb 2015
FitNets: Hints for Thin Deep Nets
Adriana Romero
Nicolas Ballas
Samira Ebrahimi Kahou
Antoine Chassang
C. Gatta
Yoshua Bengio
FedML
236
3,862
0
19 Dec 2014
Do Deep Nets Really Need to be Deep?
Lei Jimmy Ba
R. Caruana
151
2,114
0
21 Dec 2013
Distributed Representations of Words and Phrases and their Compositionality
Tomas Mikolov
Ilya Sutskever
Kai Chen
G. Corrado
J. Dean
NAI
OCL
300
33,445
0
16 Oct 2013
1