ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1908.08962
  4. Cited By
Well-Read Students Learn Better: On the Importance of Pre-training
  Compact Models

Well-Read Students Learn Better: On the Importance of Pre-training Compact Models

23 August 2019
Iulia Turc
Ming-Wei Chang
Kenton Lee
Kristina Toutanova
ArXivPDFHTML

Papers citing "Well-Read Students Learn Better: On the Importance of Pre-training Compact Models"

30 / 30 papers shown
Title
To Judge or not to Judge: Using LLM Judgements for Advertiser Keyphrase Relevance at eBay
To Judge or not to Judge: Using LLM Judgements for Advertiser Keyphrase Relevance at eBay
Soumik Dey
Hansi Wu
Binbin Li
68
1
0
07 May 2025
TRA: Better Length Generalisation with Threshold Relative Attention
TRA: Better Length Generalisation with Threshold Relative Attention
Mattia Opper
Roland Fernandez
P. Smolensky
Jianfeng Gao
75
0
0
29 Mar 2025
Banyan: Improved Representation Learning with Explicit Structure
Banyan: Improved Representation Learning with Explicit Structure
Mattia Opper
N. Siddharth
90
1
0
25 Jul 2024
Transformer to CNN: Label-scarce distillation for efficient text
  classification
Transformer to CNN: Label-scarce distillation for efficient text classification
Yew Ken Chia
Sam Witteveen
Martin Andrews
29
37
0
08 Sep 2019
ERNIE 2.0: A Continual Pre-training Framework for Language Understanding
ERNIE 2.0: A Continual Pre-training Framework for Language Understanding
Yu Sun
Shuohuan Wang
Yukun Li
Shikun Feng
Hao Tian
Hua Wu
Haifeng Wang
CLL
75
804
0
29 Jul 2019
RoBERTa: A Robustly Optimized BERT Pretraining Approach
RoBERTa: A Robustly Optimized BERT Pretraining Approach
Yinhan Liu
Myle Ott
Naman Goyal
Jingfei Du
Mandar Joshi
Danqi Chen
Omer Levy
M. Lewis
Luke Zettlemoyer
Veselin Stoyanov
AIMat
408
24,160
0
26 Jul 2019
BAM! Born-Again Multi-Task Networks for Natural Language Understanding
BAM! Born-Again Multi-Task Networks for Natural Language Understanding
Kevin Clark
Minh-Thang Luong
Urvashi Khandelwal
Christopher D. Manning
Quoc V. Le
49
229
0
10 Jul 2019
XLNet: Generalized Autoregressive Pretraining for Language Understanding
XLNet: Generalized Autoregressive Pretraining for Language Understanding
Zhilin Yang
Zihang Dai
Yiming Yang
J. Carbonell
Ruslan Salakhutdinov
Quoc V. Le
AI4CE
183
8,386
0
19 Jun 2019
Variational Pretraining for Semi-supervised Text Classification
Variational Pretraining for Semi-supervised Text Classification
Suchin Gururangan
T. Dang
Dallas Card
Noah A. Smith
VLM
35
111
0
05 Jun 2019
Model Compression with Multi-Task Knowledge Distillation for Web-scale
  Question Answering System
Model Compression with Multi-Task Knowledge Distillation for Web-scale Question Answering System
Ze Yang
Linjun Shou
Ming Gong
Wutao Lin
Daxin Jiang
KELM
18
20
0
21 Apr 2019
Distilling Task-Specific Knowledge from BERT into Simple Neural Networks
Distilling Task-Specific Knowledge from BERT into Simple Neural Networks
Raphael Tang
Yao Lu
Linqing Liu
Lili Mou
Olga Vechtomova
Jimmy J. Lin
54
419
0
28 Mar 2019
BERT: Pre-training of Deep Bidirectional Transformers for Language
  Understanding
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Jacob Devlin
Ming-Wei Chang
Kenton Lee
Kristina Toutanova
VLM
SSL
SSeg
964
93,936
0
11 Oct 2018
Attention-Guided Answer Distillation for Machine Reading Comprehension
Attention-Guided Answer Distillation for Machine Reading Comprehension
Minghao Hu
Yuxing Peng
Furu Wei
Zhen Huang
Dongsheng Li
Nan Yang
M. Zhou
FaML
48
75
0
23 Aug 2018
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language
  Understanding
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding
Alex Jinpeng Wang
Amanpreet Singh
Julian Michael
Felix Hill
Omer Levy
Samuel R. Bowman
ELM
658
7,080
0
20 Apr 2018
Annotation Artifacts in Natural Language Inference Data
Annotation Artifacts in Natural Language Inference Data
Suchin Gururangan
Swabha Swayamdipta
Omer Levy
Roy Schwartz
Samuel R. Bowman
Noah A. Smith
106
1,167
0
06 Mar 2018
Deep contextualized word representations
Deep contextualized word representations
Matthew E. Peters
Mark Neumann
Mohit Iyyer
Matt Gardner
Christopher Clark
Kenton Lee
Luke Zettlemoyer
NAI
111
11,520
0
15 Feb 2018
Few-shot learning of neural networks from scratch by pseudo example
  optimization
Few-shot learning of neural networks from scratch by pseudo example optimization
Akisato Kimura
Zoubin Ghahramani
Koh Takeuchi
Tomoharu Iwata
N. Ueda
48
52
0
08 Feb 2018
Attention Is All You Need
Attention Is All You Need
Ashish Vaswani
Noam M. Shazeer
Niki Parmar
Jakob Uszkoreit
Llion Jones
Aidan Gomez
Lukasz Kaiser
Illia Polosukhin
3DV
453
129,831
0
12 Jun 2017
A Broad-Coverage Challenge Corpus for Sentence Understanding through
  Inference
A Broad-Coverage Challenge Corpus for Sentence Understanding through Inference
Adina Williams
Nikita Nangia
Samuel R. Bowman
405
4,444
0
18 Apr 2017
Sequence-Level Knowledge Distillation
Sequence-Level Knowledge Distillation
Yoon Kim
Alexander M. Rush
84
1,109
0
25 Jun 2016
Ups and Downs: Modeling the Visual Evolution of Fashion Trends with
  One-Class Collaborative Filtering
Ups and Downs: Modeling the Visual Evolution of Fashion Trends with One-Class Collaborative Filtering
Ruining He
Julian McAuley
98
2,048
0
04 Feb 2016
Semi-supervised Sequence Learning
Semi-supervised Sequence Learning
Andrew M. Dai
Quoc V. Le
SSL
104
1,232
0
04 Nov 2015
Deep Compression: Compressing Deep Neural Networks with Pruning, Trained
  Quantization and Huffman Coding
Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding
Song Han
Huizi Mao
W. Dally
3DGS
203
8,793
0
01 Oct 2015
A large annotated corpus for learning natural language inference
A large annotated corpus for learning natural language inference
Samuel R. Bowman
Gabor Angeli
Christopher Potts
Christopher D. Manning
229
4,268
0
21 Aug 2015
Aligning Books and Movies: Towards Story-like Visual Explanations by
  Watching Movies and Reading Books
Aligning Books and Movies: Towards Story-like Visual Explanations by Watching Movies and Reading Books
Yukun Zhu
Ryan Kiros
R. Zemel
Ruslan Salakhutdinov
R. Urtasun
Antonio Torralba
Sanja Fidler
105
2,529
0
22 Jun 2015
Distilling the Knowledge in a Neural Network
Distilling the Knowledge in a Neural Network
Geoffrey E. Hinton
Oriol Vinyals
J. Dean
FedML
241
19,523
0
09 Mar 2015
Deep Learning with Limited Numerical Precision
Deep Learning with Limited Numerical Precision
Suyog Gupta
A. Agrawal
K. Gopalakrishnan
P. Narayanan
HAI
134
2,043
0
09 Feb 2015
FitNets: Hints for Thin Deep Nets
FitNets: Hints for Thin Deep Nets
Adriana Romero
Nicolas Ballas
Samira Ebrahimi Kahou
Antoine Chassang
C. Gatta
Yoshua Bengio
FedML
236
3,862
0
19 Dec 2014
Do Deep Nets Really Need to be Deep?
Do Deep Nets Really Need to be Deep?
Lei Jimmy Ba
R. Caruana
151
2,114
0
21 Dec 2013
Distributed Representations of Words and Phrases and their
  Compositionality
Distributed Representations of Words and Phrases and their Compositionality
Tomas Mikolov
Ilya Sutskever
Kai Chen
G. Corrado
J. Dean
NAI
OCL
300
33,445
0
16 Oct 2013
1