Behind the Scene: Revealing the Secrets of Pre-trained
Vision-and-Language Models

v1v2 (latest)

Behind the Scene: Revealing the Secrets of Pre-trained Vision-and-Language Models

15 May 2020

ArXiv (abs)PDF HTML

Papers citing "Behind the Scene: Revealing the Secrets of Pre-trained Vision-and-Language Models"

9 / 59 papers shown

Title
Conceptual 12M: Pushing Web-Scale Image-Text Pre-Training To Recognize Long-Tail Visual Concepts Soravit Changpinyo P. Sharma Nan Ding Radu Soricut VLM 754 1,221 0 17 Feb 2021
Decoupling the Role of Data, Attention, and Losses in Multimodal Transformers Lisa Anne Hendricks John F. J. Mellor R. Schneider Jean-Baptiste Alayrac Aida Nematzadeh 166 118 0 31 Jan 2021
Reasoning over Vision and Language: Exploring the Benefits of Supplemental Knowledge Violetta Shevchenko Damien Teney A. Dick Anton Van Den Hengel 91 31 0 15 Jan 2021
Seeing past words: Testing the cross-modal capabilities of pretrained V&L models on counting tasks Letitia Parcalabescu Albert Gatt Anette Frank Iacer Calixto LRM 121 49 0 22 Dec 2020
Enhance Multimodal Transformer With External Label And In-Domain Pretrain: Hateful Meme Challenge Winning Solution Ron Zhu 82 84 0 15 Dec 2020
TAP: Text-Aware Pre-training for Text-VQA and Text-Caption Zhengyuan Yang Yijuan Lu Jianfeng Wang Xi Yin D. Florêncio Lijuan Wang Cha Zhang Lei Zhang Jiebo Luo VLM 150 148 0 08 Dec 2020
A Comparison of Pre-trained Vision-and-Language Models for Multimodal Representation Learning across Medical Images and Reports Yikuan Li Hanyin Wang Yuan Luo 85 70 0 03 Sep 2020
Large-Scale Adversarial Training for Vision-and-Language Representation Learning Zhe Gan Yen-Chun Chen Linjie Li Chen Zhu Yu Cheng Jingjing Liu ObjD VLM 221 510 0 11 Jun 2020
Trends in Integration of Vision and Language Research: A Survey of Tasks, Datasets, and Methods Aditya Mogadala M. Kalimuthu Dietrich Klakow VLM 199 137 0 22 Jul 2019