Multimodal Pretraining Unmasked: A Meta-Analysis and a Unified Framework of Vision-and-Language BERTs

30 November 2020

Papers citing "Multimodal Pretraining Unmasked: A Meta-Analysis and a Unified Framework of Vision-and-Language BERTs"

22 / 72 papers shown

Title
Vision-and-Language Pretrained Models: A Survey Siqu Long Feiqi Cao S. Han Haiqing Yang VLM 14 63 0 15 Apr 2022
Image Retrieval from Contextual Descriptions Benno Krojer Vaibhav Adlakha Vibhav Vineet Yash Goyal E. Ponti Siva Reddy 11 29 0 29 Mar 2022
Finding Structural Knowledge in Multimodal-BERT Victor Milewski Miryam de Lhoneux Marie-Francine Moens 12 9 0 17 Mar 2022
Grounding Commands for Autonomous Vehicles via Layer Fusion with Region-specific Dynamic Layer Attention Hou Pong Chan M. Guo Chengguang Xu 8 4 0 14 Mar 2022
Vision-Language Intelligence: Tasks, Representation Learning, and Large Models Feng Li Hao Zhang Yi-Fan Zhang S. Liu Jian Guo L. Ni Pengchuan Zhang Lei Zhang AI4TS VLM 11 36 0 03 Mar 2022
Kernelized Concept Erasure Shauli Ravfogel Francisco Vargas Yoav Goldberg Ryan Cotterell 9 32 0 28 Jan 2022
IGLUE: A Benchmark for Transfer Learning across Modalities, Tasks, and Languages Emanuele Bugliarello Fangyu Liu Jonas Pfeiffer Siva Reddy Desmond Elliott E. Ponti Ivan Vulić MLLM VLM ELM 27 62 0 27 Jan 2022
A Fistful of Words: Learning Transferable Visual Models from Bag-of-Words Supervision Ajinkya Tejankar Maziar Sanjabi Bichen Wu Saining Xie Madian Khabsa Hamed Pirsiavash Hamed Firooz VLM 13 17 0 27 Dec 2021
TraVLR: Now You See It, Now You Don't! A Bimodal Dataset for Evaluating Visio-Linguistic Reasoning Keng Ji Chow Samson Tan MingSung Kan LRM 13 4 0 21 Nov 2021
MEmoBERT: Pre-training Model with Prompt-based Learning for Multimodal Emotion Recognition Jinming Zhao Ruichen Li Qin Jin Xinchao Wang Haizhou Li 19 25 0 27 Oct 2021
Visually Grounded Reasoning across Languages and Cultures Fangyu Liu Emanuele Bugliarello E. Ponti Siva Reddy Nigel Collier Desmond Elliott VLM LRM 92 167 0 28 Sep 2021
COVR: A test-bed for Visually Grounded Compositional Generalization with real images Ben Bogin Shivanshu Gupta Matt Gardner Jonathan Berant CoGe 29 29 0 22 Sep 2021
Image Captioning for Effective Use of Language Models in Knowledge-Based Visual Question Answering Ander Salaberria Gorka Azkune Oier López de Lacalle Aitor Soroa Etxabe Eneko Agirre 17 58 0 15 Sep 2021
xGQA: Cross-Lingual Visual Question Answering Jonas Pfeiffer Gregor Geigle Aishwarya Kamath Jan-Martin O. Steitz Stefan Roth Ivan Vulić Iryna Gurevych 16 56 0 13 Sep 2021
Vision-and-Language or Vision-for-Language? On Cross-Modal Influence in Multimodal Transformers Stella Frank Emanuele Bugliarello Desmond Elliott 17 82 0 09 Sep 2021
Multi-modal Understanding and Generation for Medical Images and Text via Vision-Language Pre-Training Jong Hak Moon HyunGyung Lee W. Shin Young-Hak Kim E. Choi MedIm 9 147 0 24 May 2021
VisQA: X-raying Vision and Language Reasoning in Transformers Theo Jaunet Corentin Kervadec Romain Vuillemot G. Antipov M. Baccouche Christian Wolf 8 26 0 02 Apr 2021
Retrieve Fast, Rerank Smart: Cooperative and Joint Approaches for Improved Cross-Modal Retrieval Gregor Geigle Jonas Pfeiffer Nils Reimers Ivan Vulić Iryna Gurevych 19 59 0 22 Mar 2021
Unified Vision-Language Pre-Training for Image Captioning and VQA Luowei Zhou Hamid Palangi Lei Zhang Houdong Hu Jason J. Corso Jianfeng Gao MLLM VLM 250 922 0 24 Sep 2019
Are We Modeling the Task or the Annotator? An Investigation of Annotator Bias in Natural Language Understanding Datasets Mor Geva Yoav Goldberg Jonathan Berant 235 319 0 21 Aug 2019
Aggregated Residual Transformations for Deep Neural Networks Saining Xie Ross B. Girshick Piotr Dollár Z. Tu Kaiming He 261 10,106 0 16 Nov 2016
Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation Yonghui Wu M. Schuster Z. Chen Quoc V. Le Mohammad Norouzi ... Alex Rudnick Oriol Vinyals G. Corrado Macduff Hughes J. Dean AIMat 716 6,724 0 26 Sep 2016