Well-Read Students Learn Better: On the Importance of Pre-training Compact Models

23 August 2019

Papers citing "Well-Read Students Learn Better: On the Importance of Pre-training Compact Models"

30 / 30 papers shown

Title
To Judge or not to Judge: Using LLM Judgements for Advertiser Keyphrase Relevance at eBay Soumik Dey Hansi Wu Binbin Li 68 1 0 07 May 2025
TRA: Better Length Generalisation with Threshold Relative Attention Mattia Opper Roland Fernandez P. Smolensky Jianfeng Gao 75 0 0 29 Mar 2025
Banyan: Improved Representation Learning with Explicit Structure Mattia Opper N. Siddharth 90 1 0 25 Jul 2024
Transformer to CNN: Label-scarce distillation for efficient text classification Yew Ken Chia Sam Witteveen Martin Andrews 29 37 0 08 Sep 2019
ERNIE 2.0: A Continual Pre-training Framework for Language Understanding Yu Sun Shuohuan Wang Yukun Li Shikun Feng Hao Tian Hua Wu Haifeng Wang CLL 75 804 0 29 Jul 2019
RoBERTa: A Robustly Optimized BERT Pretraining Approach Yinhan Liu Myle Ott Naman Goyal Jingfei Du Mandar Joshi Danqi Chen Omer Levy M. Lewis Luke Zettlemoyer Veselin Stoyanov AIMat 408 24,160 0 26 Jul 2019
BAM! Born-Again Multi-Task Networks for Natural Language Understanding Kevin Clark Minh-Thang Luong Urvashi Khandelwal Christopher D. Manning Quoc V. Le 49 229 0 10 Jul 2019
XLNet: Generalized Autoregressive Pretraining for Language Understanding Zhilin Yang Zihang Dai Yiming Yang J. Carbonell Ruslan Salakhutdinov Quoc V. Le AI4CE 183 8,386 0 19 Jun 2019
Variational Pretraining for Semi-supervised Text Classification Suchin Gururangan T. Dang Dallas Card Noah A. Smith VLM 35 111 0 05 Jun 2019
Model Compression with Multi-Task Knowledge Distillation for Web-scale Question Answering System Ze Yang Linjun Shou Ming Gong Wutao Lin Daxin Jiang KELM 18 20 0 21 Apr 2019
Distilling Task-Specific Knowledge from BERT into Simple Neural Networks Raphael Tang Yao Lu Linqing Liu Lili Mou Olga Vechtomova Jimmy J. Lin 54 419 0 28 Mar 2019
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding Jacob Devlin Ming-Wei Chang Kenton Lee Kristina Toutanova VLM SSL SSeg 964 93,936 0 11 Oct 2018
Attention-Guided Answer Distillation for Machine Reading Comprehension Minghao Hu Yuxing Peng Furu Wei Zhen Huang Dongsheng Li Nan Yang M. Zhou FaML 48 75 0 23 Aug 2018
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding Alex Jinpeng Wang Amanpreet Singh Julian Michael Felix Hill Omer Levy Samuel R. Bowman ELM 658 7,080 0 20 Apr 2018
Annotation Artifacts in Natural Language Inference Data Suchin Gururangan Swabha Swayamdipta Omer Levy Roy Schwartz Samuel R. Bowman Noah A. Smith 106 1,167 0 06 Mar 2018
Deep contextualized word representations Matthew E. Peters Mark Neumann Mohit Iyyer Matt Gardner Christopher Clark Kenton Lee Luke Zettlemoyer NAI 111 11,520 0 15 Feb 2018
Few-shot learning of neural networks from scratch by pseudo example optimization Akisato Kimura Zoubin Ghahramani Koh Takeuchi Tomoharu Iwata N. Ueda 48 52 0 08 Feb 2018
Attention Is All You Need Ashish Vaswani Noam M. Shazeer Niki Parmar Jakob Uszkoreit Llion Jones Aidan Gomez Lukasz Kaiser Illia Polosukhin 3DV 453 129,831 0 12 Jun 2017
A Broad-Coverage Challenge Corpus for Sentence Understanding through Inference Adina Williams Nikita Nangia Samuel R. Bowman 405 4,444 0 18 Apr 2017
Sequence-Level Knowledge Distillation Yoon Kim Alexander M. Rush 84 1,109 0 25 Jun 2016
Ups and Downs: Modeling the Visual Evolution of Fashion Trends with One-Class Collaborative Filtering Ruining He Julian McAuley 98 2,048 0 04 Feb 2016
Semi-supervised Sequence Learning Andrew M. Dai Quoc V. Le SSL 104 1,232 0 04 Nov 2015
Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding Song Han Huizi Mao W. Dally 3DGS 203 8,793 0 01 Oct 2015
A large annotated corpus for learning natural language inference Samuel R. Bowman Gabor Angeli Christopher Potts Christopher D. Manning 229 4,268 0 21 Aug 2015
Aligning Books and Movies: Towards Story-like Visual Explanations by Watching Movies and Reading Books Yukun Zhu Ryan Kiros R. Zemel Ruslan Salakhutdinov R. Urtasun Antonio Torralba Sanja Fidler 105 2,529 0 22 Jun 2015
Distilling the Knowledge in a Neural Network Geoffrey E. Hinton Oriol Vinyals J. Dean FedML 241 19,523 0 09 Mar 2015
Deep Learning with Limited Numerical Precision Suyog Gupta A. Agrawal K. Gopalakrishnan P. Narayanan HAI 134 2,043 0 09 Feb 2015
FitNets: Hints for Thin Deep Nets Adriana Romero Nicolas Ballas Samira Ebrahimi Kahou Antoine Chassang C. Gatta Yoshua Bengio FedML 236 3,862 0 19 Dec 2014
Do Deep Nets Really Need to be Deep? Lei Jimmy Ba R. Caruana 151 2,114 0 21 Dec 2013
Distributed Representations of Words and Phrases and their Compositionality Tomas Mikolov Ilya Sutskever Kai Chen G. Corrado J. Dean NAI OCL 300 33,445 0 16 Oct 2013