Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
1910.10683
Cited By
v1
v2
v3
v4 (latest)
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer
Journal of machine learning research (JMLR), 2019
23 October 2019
Colin Raffel
Noam M. Shazeer
Adam Roberts
Katherine Lee
Sharan Narang
Michael Matena
Yanqi Zhou
Wei Li
Peter J. Liu
AIMat
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer"
50 / 12,032 papers shown
A Primer in BERTology: What we know about how BERT works
Transactions of the Association for Computational Linguistics (TACL), 2020
Anna Rogers
Olga Kovaleva
Anna Rumshisky
OffRL
474
1,717
0
27 Feb 2020
Compressing Large-Scale Transformer-Based Models: A Case Study on BERT
Transactions of the Association for Computational Linguistics (TACL), 2020
Prakhar Ganesh
Yao Chen
Xin Lou
Mohammad Ali Khan
Yifan Yang
Hassan Sajjad
Preslav Nakov
Deming Chen
Marianne Winslett
AI4CE
437
213
0
27 Feb 2020
Train Large, Then Compress: Rethinking Model Size for Efficient Training and Inference of Transformers
Zhuohan Li
Eric Wallace
Sheng Shen
Kevin Lin
Kurt Keutzer
Dan Klein
Joseph E. Gonzalez
293
152
0
26 Feb 2020
On Feature Normalization and Data Augmentation
Computer Vision and Pattern Recognition (CVPR), 2020
Boyi Li
Felix Wu
Ser-Nam Lim
Serge J. Belongie
Kilian Q. Weinberger
230
156
0
25 Feb 2020
MiniLM: Deep Self-Attention Distillation for Task-Agnostic Compression of Pre-Trained Transformers
Neural Information Processing Systems (NeurIPS), 2020
Wenhui Wang
Furu Wei
Li Dong
Hangbo Bao
Nan Yang
Ming Zhou
VLM
1.3K
1,757
0
25 Feb 2020
Improving BERT Fine-Tuning via Self-Ensemble and Self-Distillation
Journal of Computational Science and Technology (JCST), 2020
Yige Xu
Xipeng Qiu
L. Zhou
Xuanjing Huang
150
73
0
24 Feb 2020
Training Question Answering Models From Synthetic Data
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2020
Raul Puri
Ryan Spring
M. Patwary
Mohammad Shoeybi
Bryan Catanzaro
ELM
195
169
0
22 Feb 2020
Modelling Latent Skills for Multitask Language Generation
Kris Cao
Dani Yogatama
140
3
0
21 Feb 2020
Fast local linear regression with anchor regularization
Mathis Petrovich
M. Yamada
OffRL
157
3
0
21 Feb 2020
A Road Map to Strong Intelligence
Philip Paquette
AI4TS
45
0
0
20 Feb 2020
CodeBERT: A Pre-Trained Model for Programming and Natural Languages
Findings (Findings), 2020
Zhangyin Feng
Daya Guo
Duyu Tang
Nan Duan
Xiaocheng Feng
...
Linjun Shou
Bing Qin
Ting Liu
Daxin Jiang
Ming Zhou
1.2K
3,386
0
19 Feb 2020
LAMBERT: Layout-Aware (Language) Modeling for information extraction
IEEE International Conference on Document Analysis and Recognition (ICDAR), 2020
Lukasz Garncarek
Rafal Powalski
Tomasz Stanislawek
Bartosz Topolski
Piotr Halama
M. Turski
Filip Graliñski
335
95
0
19 Feb 2020
The Microsoft Toolkit of Multi-Task Deep Neural Networks for Natural Language Understanding
Annual Meeting of the Association for Computational Linguistics (ACL), 2020
Xiaodong Liu
Yu Wang
Jianshu Ji
Hao Cheng
Xueyun Zhu
...
Pengcheng He
Weizhu Chen
Hoifung Poon
Guihong Cao
Jianfeng Gao
AI4CE
182
62
0
19 Feb 2020
Controlling Computation versus Quality for Neural Sequence Models
Ankur Bapna
N. Arivazhagan
Orhan Firat
219
34
0
17 Feb 2020
UniVL: A Unified Video and Language Pre-Training Model for Multimodal Understanding and Generation
Huaishao Luo
Lei Ji
Ding Wang
Haoyang Huang
Nan Duan
Tianrui Li
Jason Li
Xilin Chen
Ming Zhou
VLM
375
417
0
15 Feb 2020
Fine-Tuning Pretrained Language Models: Weight Initializations, Data Orders, and Early Stopping
Jesse Dodge
Gabriel Ilharco
Roy Schwartz
Ali Farhadi
Hannaneh Hajishirzi
Noah A. Smith
282
676
0
15 Feb 2020
TwinBERT: Distilling Knowledge to Twin-Structured BERT Models for Efficient Retrieval
Wenhao Lu
Jian Jiao
Ruofei Zhang
188
53
0
14 Feb 2020
Transformer on a Diet
Chenguang Wang
Zihao Ye
Aston Zhang
Zheng Zhang
Alex Smola
220
9
0
14 Feb 2020
CBAG: Conditional Biomedical Abstract Generation
PLoS ONE (PLOS ONE), 2020
Justin Sybrandt
Ilya Safro
MedIm
AI4CE
146
10
0
13 Feb 2020
GLU Variants Improve Transformer
Noam M. Shazeer
585
1,477
0
12 Feb 2020
How Much Knowledge Can You Pack Into the Parameters of a Language Model?
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2020
Adam Roberts
Colin Raffel
Noam M. Shazeer
KELM
573
993
0
10 Feb 2020
REALM: Retrieval-Augmented Language Model Pre-Training
International Conference on Machine Learning (ICML), 2020
Kelvin Guu
Kenton Lee
Zora Tung
Panupong Pasupat
Ming-Wei Chang
RALM
1.2K
2,611
0
10 Feb 2020
Semi-Supervised Class Discovery
Jeremy Nixon
J. Liu
David Berthelot
266
2
0
10 Feb 2020
Momentum Improves Normalized SGD
International Conference on Machine Learning (ICML), 2020
Ashok Cutkosky
Harsh Mehta
ODL
450
159
0
09 Feb 2020
Segmented Graph-Bert for Graph Instance Modeling
Jiawei Zhang
SSeg
129
6
0
09 Feb 2020
Description Based Text Classification with Reinforcement Learning
International Conference on Machine Learning (ICML), 2020
Duo Chai
Wei Wu
Qinghong Han
Leilei Gan
Jiwei Li
VLM
389
70
0
08 Feb 2020
Aligning the Pretraining and Finetuning Objectives of Language Models
Nuo Wang Pierse
Jing Lu
AI4CE
99
2
0
05 Feb 2020
K-Adapter: Infusing Knowledge into Pre-Trained Models with Adapters
Findings (Findings), 2020
Ruize Wang
Duyu Tang
Nan Duan
Zhongyu Wei
Xuanjing Huang
Jianshu Ji
Guihong Cao
Daxin Jiang
Ming Zhou
KELM
577
595
0
05 Feb 2020
DUMA: Reading Comprehension with Transposition Thinking
IEEE/ACM Transactions on Audio Speech and Language Processing (TASLP), 2020
Q. Hu
Hai Zhao
Xiaoguang Li
AI4CE
408
37
0
26 Jan 2020
ERNIE-GEN: An Enhanced Multi-Flow Pre-training and Fine-tuning Framework for Natural Language Generation
International Joint Conference on Artificial Intelligence (IJCAI), 2020
Dongling Xiao
Han Zhang
Yukun Li
Yu Sun
Hao Tian
Hua Wu
Haifeng Wang
215
133
0
26 Jan 2020
Scaling Laws for Neural Language Models
Jared Kaplan
Sam McCandlish
T. Henighan
Tom B. Brown
B. Chess
R. Child
Scott Gray
Alec Radford
Jeff Wu
Dario Amodei
1.8K
6,691
0
23 Jan 2020
Multilingual Denoising Pre-training for Neural Machine Translation
Transactions of the Association for Computational Linguistics (TACL), 2020
Yinhan Liu
Jiatao Gu
Naman Goyal
Xian Li
Sergey Edunov
Marjan Ghazvininejad
M. Lewis
Luke Zettlemoyer
AI4CE
AIMat
899
1,982
0
22 Jan 2020
Normalization of Input-output Shared Embeddings in Text Generation Models
Jinyang Liu
Yujia Zhai
Zizhong Chen
133
0
0
22 Jan 2020
FixMatch: Simplifying Semi-Supervised Learning with Consistency and Confidence
Neural Information Processing Systems (NeurIPS), 2020
Kihyuk Sohn
David Berthelot
Chun-Liang Li
Zizhao Zhang
Nicholas Carlini
E. D. Cubuk
Alexey Kurakin
Han Zhang
Colin Raffel
AAML
451
4,275
0
21 Jan 2020
Exploiting Cloze Questions for Few Shot Text Classification and Natural Language Inference
Conference of the European Chapter of the Association for Computational Linguistics (EACL), 2020
Timo Schick
Hinrich Schütze
1.1K
1,758
0
21 Jan 2020
Length-controllable Abstractive Summarization by Guiding with Summary Prototype
Itsumi Saito
Kyosuke Nishida
Kosuke Nishida
Atsushi Otsuka
Hisako Asano
J. Tomita
Hiroyuki Shindo
Yuji Matsumoto
248
37
0
21 Jan 2020
A multimodal deep learning approach for named entity recognition from social media
M. Asgari-Chenaghlu
M. Feizi-Derakhshi
Leili Farzinvash
M. Balafar
C. Motamed
272
36
0
19 Jan 2020
RobBERT: a Dutch RoBERTa-based Language Model
Findings (Findings), 2020
Pieter Delobelle
Thomas Winters
Bettina Berendt
198
262
0
17 Jan 2020
ProphetNet: Predicting Future N-gram for Sequence-to-Sequence Pre-training
Findings (Findings), 2020
Weizhen Qi
Yu Yan
Yeyun Gong
Dayiheng Liu
Nan Duan
Jiusheng Chen
Ruofei Zhang
Ming Zhou
AI4TS
367
472
0
13 Jan 2020
Learning Accurate Integer Transformer Machine-Translation Models
SN Computer Science (SN Comput. Sci.), 2020
Ephrem Wu
97
4
0
03 Jan 2020
What Does My QA Model Know? Devising Controlled Probes using Expert Knowledge
Transactions of the Association for Computational Linguistics (TACL), 2019
Kyle Richardson
Ashish Sabharwal
243
47
0
31 Dec 2019
All-in-One Image-Grounded Conversational Agents
Da Ju
Kurt Shuster
Y-Lan Boureau
Jason Weston
LLMAG
147
9
0
28 Dec 2019
Coursera Corpus Mining and Multistage Fine-Tuning for Improving Lectures Translation
International Conference on Language Resources and Evaluation (LREC), 2019
Israfel Salazar
Mary Dabre
Atsushi Fujita
Sadao Kurohashi
210
6
0
26 Dec 2019
PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization
International Conference on Machine Learning (ICML), 2019
Jingqing Zhang
Yao-Min Zhao
Mohammad Saleh
Peter J. Liu
RALM
3DGS
842
2,310
0
18 Dec 2019
Multilingual is not enough: BERT for Finnish
Antti Virtanen
Jenna Kanerva
Rami Ilo
Jouni Luoma
Juhani Luotolahti
T. Salakoski
Filip Ginter
S. Pyysalo
250
300
0
15 Dec 2019
WaLDORf: Wasteless Language-model Distillation On Reading-comprehension
J. Tian
A. Kreuzer
Pai-Hung Chen
Hans-Martin Will
VLM
167
3
0
13 Dec 2019
Extending Machine Language Models toward Human-Level Language Understanding
James L. McClelland
Felix Hill
Maja R. Rudolph
Jason Baldridge
Hinrich Schütze
LRM
156
36
0
12 Dec 2019
FlauBERT: Unsupervised Language Model Pre-training for French
International Conference on Language Resources and Evaluation (LREC), 2019
Hang Le
Loïc Vial
Jibril Frej
Vincent Segonne
Maximin Coavoux
Benjamin Lecouteux
A. Allauzen
Benoît Crabbé
Laurent Besacier
D. Schwab
AI4CE
340
431
0
11 Dec 2019
Zero-shot Text Classification With Generative Language Models
Raul Puri
Bryan Catanzaro
VLM
166
116
0
10 Dec 2019
Large-scale Pretraining for Visual Dialog: A Simple State-of-the-Art Baseline
European Conference on Computer Vision (ECCV), 2019
Vishvak Murahari
Dhruv Batra
Devi Parikh
Abhishek Das
VLM
349
120
0
05 Dec 2019
Previous
1
2
3
...
239
240
241
Next