Papers
Communities
Organizations
Events
Blog
Pricing
Feedback
Contact Sales
Search
Open menu
Home
Papers
2005.07310
Cited By
v1
v2 (latest)
Behind the Scene: Revealing the Secrets of Pre-trained Vision-and-Language Models
15 May 2020
Jize Cao
Zhe Gan
Yu Cheng
Licheng Yu
Yen-Chun Chen
Jingjing Liu
VLM
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Behind the Scene: Revealing the Secrets of Pre-trained Vision-and-Language Models"
9 / 59 papers shown
Title
Conceptual 12M: Pushing Web-Scale Image-Text Pre-Training To Recognize Long-Tail Visual Concepts
Soravit Changpinyo
P. Sharma
Nan Ding
Radu Soricut
VLM
754
1,221
0
17 Feb 2021
Decoupling the Role of Data, Attention, and Losses in Multimodal Transformers
Lisa Anne Hendricks
John F. J. Mellor
R. Schneider
Jean-Baptiste Alayrac
Aida Nematzadeh
166
118
0
31 Jan 2021
Reasoning over Vision and Language: Exploring the Benefits of Supplemental Knowledge
Violetta Shevchenko
Damien Teney
A. Dick
Anton Van Den Hengel
91
31
0
15 Jan 2021
Seeing past words: Testing the cross-modal capabilities of pretrained V&L models on counting tasks
Letitia Parcalabescu
Albert Gatt
Anette Frank
Iacer Calixto
LRM
121
49
0
22 Dec 2020
Enhance Multimodal Transformer With External Label And In-Domain Pretrain: Hateful Meme Challenge Winning Solution
Ron Zhu
82
84
0
15 Dec 2020
TAP: Text-Aware Pre-training for Text-VQA and Text-Caption
Zhengyuan Yang
Yijuan Lu
Jianfeng Wang
Xi Yin
D. Florêncio
Lijuan Wang
Cha Zhang
Lei Zhang
Jiebo Luo
VLM
150
148
0
08 Dec 2020
A Comparison of Pre-trained Vision-and-Language Models for Multimodal Representation Learning across Medical Images and Reports
Yikuan Li
Hanyin Wang
Yuan Luo
85
70
0
03 Sep 2020
Large-Scale Adversarial Training for Vision-and-Language Representation Learning
Zhe Gan
Yen-Chun Chen
Linjie Li
Chen Zhu
Yu Cheng
Jingjing Liu
ObjD
VLM
221
510
0
11 Jun 2020
Trends in Integration of Vision and Language Research: A Survey of Tasks, Datasets, and Methods
Aditya Mogadala
M. Kalimuthu
Dietrich Klakow
VLM
199
137
0
22 Jul 2019
Previous
1
2