ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1910.10683
  4. Cited By
Exploring the Limits of Transfer Learning with a Unified Text-to-Text
  Transformer
v1v2v3v4 (latest)

Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

Journal of machine learning research (JMLR), 2019
23 October 2019
Colin Raffel
Noam M. Shazeer
Adam Roberts
Katherine Lee
Sharan Narang
Michael Matena
Yanqi Zhou
Wei Li
Peter J. Liu
    AIMat
ArXiv (abs)PDFHTML

Papers citing "Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer"

50 / 12,032 papers shown
A Primer in BERTology: What we know about how BERT works
A Primer in BERTology: What we know about how BERT worksTransactions of the Association for Computational Linguistics (TACL), 2020
Anna Rogers
Olga Kovaleva
Anna Rumshisky
OffRL
474
1,717
0
27 Feb 2020
Compressing Large-Scale Transformer-Based Models: A Case Study on BERT
Compressing Large-Scale Transformer-Based Models: A Case Study on BERTTransactions of the Association for Computational Linguistics (TACL), 2020
Prakhar Ganesh
Yao Chen
Xin Lou
Mohammad Ali Khan
Yifan Yang
Hassan Sajjad
Preslav Nakov
Deming Chen
Marianne Winslett
AI4CE
437
213
0
27 Feb 2020
Train Large, Then Compress: Rethinking Model Size for Efficient Training
  and Inference of Transformers
Train Large, Then Compress: Rethinking Model Size for Efficient Training and Inference of Transformers
Zhuohan Li
Eric Wallace
Sheng Shen
Kevin Lin
Kurt Keutzer
Dan Klein
Joseph E. Gonzalez
293
152
0
26 Feb 2020
On Feature Normalization and Data Augmentation
On Feature Normalization and Data AugmentationComputer Vision and Pattern Recognition (CVPR), 2020
Boyi Li
Felix Wu
Ser-Nam Lim
Serge J. Belongie
Kilian Q. Weinberger
230
156
0
25 Feb 2020
MiniLM: Deep Self-Attention Distillation for Task-Agnostic Compression
  of Pre-Trained Transformers
MiniLM: Deep Self-Attention Distillation for Task-Agnostic Compression of Pre-Trained TransformersNeural Information Processing Systems (NeurIPS), 2020
Wenhui Wang
Furu Wei
Li Dong
Hangbo Bao
Nan Yang
Ming Zhou
VLM
1.3K
1,757
0
25 Feb 2020
Improving BERT Fine-Tuning via Self-Ensemble and Self-Distillation
Improving BERT Fine-Tuning via Self-Ensemble and Self-DistillationJournal of Computational Science and Technology (JCST), 2020
Yige Xu
Xipeng Qiu
L. Zhou
Xuanjing Huang
150
73
0
24 Feb 2020
Training Question Answering Models From Synthetic Data
Training Question Answering Models From Synthetic DataConference on Empirical Methods in Natural Language Processing (EMNLP), 2020
Raul Puri
Ryan Spring
M. Patwary
Mohammad Shoeybi
Bryan Catanzaro
ELM
195
169
0
22 Feb 2020
Modelling Latent Skills for Multitask Language Generation
Modelling Latent Skills for Multitask Language Generation
Kris Cao
Dani Yogatama
140
3
0
21 Feb 2020
Fast local linear regression with anchor regularization
Fast local linear regression with anchor regularization
Mathis Petrovich
M. Yamada
OffRL
157
3
0
21 Feb 2020
A Road Map to Strong Intelligence
A Road Map to Strong Intelligence
Philip Paquette
AI4TS
45
0
0
20 Feb 2020
CodeBERT: A Pre-Trained Model for Programming and Natural Languages
CodeBERT: A Pre-Trained Model for Programming and Natural LanguagesFindings (Findings), 2020
Zhangyin Feng
Daya Guo
Duyu Tang
Nan Duan
Xiaocheng Feng
...
Linjun Shou
Bing Qin
Ting Liu
Daxin Jiang
Ming Zhou
1.2K
3,386
0
19 Feb 2020
LAMBERT: Layout-Aware (Language) Modeling for information extraction
LAMBERT: Layout-Aware (Language) Modeling for information extractionIEEE International Conference on Document Analysis and Recognition (ICDAR), 2020
Lukasz Garncarek
Rafal Powalski
Tomasz Stanislawek
Bartosz Topolski
Piotr Halama
M. Turski
Filip Graliñski
335
95
0
19 Feb 2020
The Microsoft Toolkit of Multi-Task Deep Neural Networks for Natural
  Language Understanding
The Microsoft Toolkit of Multi-Task Deep Neural Networks for Natural Language UnderstandingAnnual Meeting of the Association for Computational Linguistics (ACL), 2020
Xiaodong Liu
Yu Wang
Jianshu Ji
Hao Cheng
Xueyun Zhu
...
Pengcheng He
Weizhu Chen
Hoifung Poon
Guihong Cao
Jianfeng Gao
AI4CE
182
62
0
19 Feb 2020
Controlling Computation versus Quality for Neural Sequence Models
Controlling Computation versus Quality for Neural Sequence Models
Ankur Bapna
N. Arivazhagan
Orhan Firat
219
34
0
17 Feb 2020
UniVL: A Unified Video and Language Pre-Training Model for Multimodal
  Understanding and Generation
UniVL: A Unified Video and Language Pre-Training Model for Multimodal Understanding and Generation
Huaishao Luo
Lei Ji
Ding Wang
Haoyang Huang
Nan Duan
Tianrui Li
Jason Li
Xilin Chen
Ming Zhou
VLM
375
417
0
15 Feb 2020
Fine-Tuning Pretrained Language Models: Weight Initializations, Data
  Orders, and Early Stopping
Fine-Tuning Pretrained Language Models: Weight Initializations, Data Orders, and Early Stopping
Jesse Dodge
Gabriel Ilharco
Roy Schwartz
Ali Farhadi
Hannaneh Hajishirzi
Noah A. Smith
282
676
0
15 Feb 2020
TwinBERT: Distilling Knowledge to Twin-Structured BERT Models for
  Efficient Retrieval
TwinBERT: Distilling Knowledge to Twin-Structured BERT Models for Efficient Retrieval
Wenhao Lu
Jian Jiao
Ruofei Zhang
188
53
0
14 Feb 2020
Transformer on a Diet
Transformer on a Diet
Chenguang Wang
Zihao Ye
Aston Zhang
Zheng Zhang
Alex Smola
220
9
0
14 Feb 2020
CBAG: Conditional Biomedical Abstract Generation
CBAG: Conditional Biomedical Abstract GenerationPLoS ONE (PLOS ONE), 2020
Justin Sybrandt
Ilya Safro
MedImAI4CE
146
10
0
13 Feb 2020
GLU Variants Improve Transformer
GLU Variants Improve Transformer
Noam M. Shazeer
585
1,477
0
12 Feb 2020
How Much Knowledge Can You Pack Into the Parameters of a Language Model?
How Much Knowledge Can You Pack Into the Parameters of a Language Model?Conference on Empirical Methods in Natural Language Processing (EMNLP), 2020
Adam Roberts
Colin Raffel
Noam M. Shazeer
KELM
573
993
0
10 Feb 2020
REALM: Retrieval-Augmented Language Model Pre-Training
REALM: Retrieval-Augmented Language Model Pre-TrainingInternational Conference on Machine Learning (ICML), 2020
Kelvin Guu
Kenton Lee
Zora Tung
Panupong Pasupat
Ming-Wei Chang
RALM
1.2K
2,611
0
10 Feb 2020
Semi-Supervised Class Discovery
Semi-Supervised Class Discovery
Jeremy Nixon
J. Liu
David Berthelot
266
2
0
10 Feb 2020
Momentum Improves Normalized SGD
Momentum Improves Normalized SGDInternational Conference on Machine Learning (ICML), 2020
Ashok Cutkosky
Harsh Mehta
ODL
450
159
0
09 Feb 2020
Segmented Graph-Bert for Graph Instance Modeling
Segmented Graph-Bert for Graph Instance Modeling
Jiawei Zhang
SSeg
129
6
0
09 Feb 2020
Description Based Text Classification with Reinforcement Learning
Description Based Text Classification with Reinforcement LearningInternational Conference on Machine Learning (ICML), 2020
Duo Chai
Wei Wu
Qinghong Han
Leilei Gan
Jiwei Li
VLM
389
70
0
08 Feb 2020
Aligning the Pretraining and Finetuning Objectives of Language Models
Aligning the Pretraining and Finetuning Objectives of Language Models
Nuo Wang Pierse
Jing Lu
AI4CE
99
2
0
05 Feb 2020
K-Adapter: Infusing Knowledge into Pre-Trained Models with Adapters
K-Adapter: Infusing Knowledge into Pre-Trained Models with AdaptersFindings (Findings), 2020
Ruize Wang
Duyu Tang
Nan Duan
Zhongyu Wei
Xuanjing Huang
Jianshu Ji
Guihong Cao
Daxin Jiang
Ming Zhou
KELM
577
595
0
05 Feb 2020
DUMA: Reading Comprehension with Transposition Thinking
DUMA: Reading Comprehension with Transposition ThinkingIEEE/ACM Transactions on Audio Speech and Language Processing (TASLP), 2020
Q. Hu
Hai Zhao
Xiaoguang Li
AI4CE
408
37
0
26 Jan 2020
ERNIE-GEN: An Enhanced Multi-Flow Pre-training and Fine-tuning Framework
  for Natural Language Generation
ERNIE-GEN: An Enhanced Multi-Flow Pre-training and Fine-tuning Framework for Natural Language GenerationInternational Joint Conference on Artificial Intelligence (IJCAI), 2020
Dongling Xiao
Han Zhang
Yukun Li
Yu Sun
Hao Tian
Hua Wu
Haifeng Wang
215
133
0
26 Jan 2020
Scaling Laws for Neural Language Models
Scaling Laws for Neural Language Models
Jared Kaplan
Sam McCandlish
T. Henighan
Tom B. Brown
B. Chess
R. Child
Scott Gray
Alec Radford
Jeff Wu
Dario Amodei
1.8K
6,691
0
23 Jan 2020
Multilingual Denoising Pre-training for Neural Machine Translation
Multilingual Denoising Pre-training for Neural Machine TranslationTransactions of the Association for Computational Linguistics (TACL), 2020
Yinhan Liu
Jiatao Gu
Naman Goyal
Xian Li
Sergey Edunov
Marjan Ghazvininejad
M. Lewis
Luke Zettlemoyer
AI4CEAIMat
899
1,982
0
22 Jan 2020
Normalization of Input-output Shared Embeddings in Text Generation
  Models
Normalization of Input-output Shared Embeddings in Text Generation Models
Jinyang Liu
Yujia Zhai
Zizhong Chen
133
0
0
22 Jan 2020
FixMatch: Simplifying Semi-Supervised Learning with Consistency and
  Confidence
FixMatch: Simplifying Semi-Supervised Learning with Consistency and ConfidenceNeural Information Processing Systems (NeurIPS), 2020
Kihyuk Sohn
David Berthelot
Chun-Liang Li
Zizhao Zhang
Nicholas Carlini
E. D. Cubuk
Alexey Kurakin
Han Zhang
Colin Raffel
AAML
451
4,275
0
21 Jan 2020
Exploiting Cloze Questions for Few Shot Text Classification and Natural
  Language Inference
Exploiting Cloze Questions for Few Shot Text Classification and Natural Language InferenceConference of the European Chapter of the Association for Computational Linguistics (EACL), 2020
Timo Schick
Hinrich Schütze
1.1K
1,758
0
21 Jan 2020
Length-controllable Abstractive Summarization by Guiding with Summary
  Prototype
Length-controllable Abstractive Summarization by Guiding with Summary Prototype
Itsumi Saito
Kyosuke Nishida
Kosuke Nishida
Atsushi Otsuka
Hisako Asano
J. Tomita
Hiroyuki Shindo
Yuji Matsumoto
248
37
0
21 Jan 2020
A multimodal deep learning approach for named entity recognition from
  social media
A multimodal deep learning approach for named entity recognition from social media
M. Asgari-Chenaghlu
M. Feizi-Derakhshi
Leili Farzinvash
M. Balafar
C. Motamed
272
36
0
19 Jan 2020
RobBERT: a Dutch RoBERTa-based Language Model
RobBERT: a Dutch RoBERTa-based Language ModelFindings (Findings), 2020
Pieter Delobelle
Thomas Winters
Bettina Berendt
198
262
0
17 Jan 2020
ProphetNet: Predicting Future N-gram for Sequence-to-Sequence
  Pre-training
ProphetNet: Predicting Future N-gram for Sequence-to-Sequence Pre-trainingFindings (Findings), 2020
Weizhen Qi
Yu Yan
Yeyun Gong
Dayiheng Liu
Nan Duan
Jiusheng Chen
Ruofei Zhang
Ming Zhou
AI4TS
367
472
0
13 Jan 2020
Learning Accurate Integer Transformer Machine-Translation Models
Learning Accurate Integer Transformer Machine-Translation ModelsSN Computer Science (SN Comput. Sci.), 2020
Ephrem Wu
97
4
0
03 Jan 2020
What Does My QA Model Know? Devising Controlled Probes using Expert
  Knowledge
What Does My QA Model Know? Devising Controlled Probes using Expert KnowledgeTransactions of the Association for Computational Linguistics (TACL), 2019
Kyle Richardson
Ashish Sabharwal
243
47
0
31 Dec 2019
All-in-One Image-Grounded Conversational Agents
All-in-One Image-Grounded Conversational Agents
Da Ju
Kurt Shuster
Y-Lan Boureau
Jason Weston
LLMAG
147
9
0
28 Dec 2019
Coursera Corpus Mining and Multistage Fine-Tuning for Improving Lectures
  Translation
Coursera Corpus Mining and Multistage Fine-Tuning for Improving Lectures TranslationInternational Conference on Language Resources and Evaluation (LREC), 2019
Israfel Salazar
Mary Dabre
Atsushi Fujita
Sadao Kurohashi
210
6
0
26 Dec 2019
PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive
  Summarization
PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive SummarizationInternational Conference on Machine Learning (ICML), 2019
Jingqing Zhang
Yao-Min Zhao
Mohammad Saleh
Peter J. Liu
RALM3DGS
842
2,310
0
18 Dec 2019
Multilingual is not enough: BERT for Finnish
Multilingual is not enough: BERT for Finnish
Antti Virtanen
Jenna Kanerva
Rami Ilo
Jouni Luoma
Juhani Luotolahti
T. Salakoski
Filip Ginter
S. Pyysalo
250
300
0
15 Dec 2019
WaLDORf: Wasteless Language-model Distillation On Reading-comprehension
WaLDORf: Wasteless Language-model Distillation On Reading-comprehension
J. Tian
A. Kreuzer
Pai-Hung Chen
Hans-Martin Will
VLM
167
3
0
13 Dec 2019
Extending Machine Language Models toward Human-Level Language
  Understanding
Extending Machine Language Models toward Human-Level Language Understanding
James L. McClelland
Felix Hill
Maja R. Rudolph
Jason Baldridge
Hinrich Schütze
LRM
156
36
0
12 Dec 2019
FlauBERT: Unsupervised Language Model Pre-training for French
FlauBERT: Unsupervised Language Model Pre-training for FrenchInternational Conference on Language Resources and Evaluation (LREC), 2019
Hang Le
Loïc Vial
Jibril Frej
Vincent Segonne
Maximin Coavoux
Benjamin Lecouteux
A. Allauzen
Benoît Crabbé
Laurent Besacier
D. Schwab
AI4CE
340
431
0
11 Dec 2019
Zero-shot Text Classification With Generative Language Models
Zero-shot Text Classification With Generative Language Models
Raul Puri
Bryan Catanzaro
VLM
166
116
0
10 Dec 2019
Large-scale Pretraining for Visual Dialog: A Simple State-of-the-Art
  Baseline
Large-scale Pretraining for Visual Dialog: A Simple State-of-the-Art BaselineEuropean Conference on Computer Vision (ECCV), 2019
Vishvak Murahari
Dhruv Batra
Devi Parikh
Abhishek Das
VLM
349
120
0
05 Dec 2019
Previous
123...239240241
Next