Multimodal Pretraining Unmasked: A Meta-Analysis and a Unified Framework of Vision-and-Language BERTs

30 November 2020

Papers citing "Multimodal Pretraining Unmasked: A Meta-Analysis and a Unified Framework of Vision-and-Language BERTs"

50 / 72 papers shown

Title
Model Decides How to Tokenize: Adaptive DNA Sequence Tokenization with MxDNA Lifeng Qiao Peng Ye Yuchen Ren Weiqiang Bai Chaoqi Liang Xinzhu Ma Nanqing Dong W. Ouyang 71 2 0 18 Dec 2024
Do Language Models Understand Time? Xi Ding Lei Wang 162 0 0 18 Dec 2024
Unified Framework for Open-World Compositional Zero-shot Learning Hirunima Jayasekara Khoi Pham Nirat Saini Abhinav Shrivastava 56 0 0 05 Dec 2024
Renaissance: Investigating the Pretraining of Vision-Language Encoders Clayton Fields C. Kennington VLM 16 0 0 11 Nov 2024
VISTA: A Visual and Textual Attention Dataset for Interpreting Multimodal Models Harshit Tolga Tasdizen CoGe VLM 17 1 0 06 Oct 2024
Why context matters in VQA and Reasoning: Semantic interventions for VLM input modalities Kenza Amara Lukas Klein Carsten T. Lüth Paul Jäger Hendrik Strobelt Mennatallah El-Assady 25 1 0 02 Oct 2024
ComAlign: Compositional Alignment in Vision-Language Models Ali Abdollah Amirmohammad Izadi Armin Saghafian Reza Vahidimajd Mohammad Mozafari Amirreza Mirzaei Mohammadmahdi Samiei M. Baghshah CoGe VLM 25 0 0 12 Sep 2024
CV-Probes: Studying the interplay of lexical and world knowledge in visually grounded verb understanding Ivana Beňová Michal Gregor Albert Gatt 24 0 0 02 Sep 2024
BrewCLIP: A Bifurcated Representation Learning Framework for Audio-Visual Retrieval Zhenyu Lu Lakshay Sethi 38 0 0 19 Aug 2024
MuTT: A Multimodal Trajectory Transformer for Robot Skills Claudius Kienle Benjamin Alt Onur Celik P. Becker Darko Katic Rainer Jäkel Gerhard Neumann 27 2 0 22 Jul 2024
How and where does CLIP process negation? Vincent Quantmeyer Pablo Mosteiro Albert Gatt CoGe 21 6 0 15 Jul 2024
GECOBench: A Gender-Controlled Text Dataset and Benchmark for Quantifying Biases in Explanations Rick Wilming Artur Dox Hjalmar Schulz Marta Oliveira Benedict Clark Stefan Haufe 29 2 0 17 Jun 2024
No Filter: Cultural and Socioeconomic Diversity in Contrastive Vision-Language Models Angeline Pouget Lucas Beyer Emanuele Bugliarello Xiao Wang Andreas Steiner Xiao-Qi Zhai Ibrahim M. Alabdulmohsin VLM 31 7 0 22 May 2024
Acquiring Linguistic Knowledge from Multimodal Input Theodor Amariucai Alexander Scott Warstadt CLL 16 2 0 27 Feb 2024
Beyond Image-Text Matching: Verb Understanding in Multimodal Transformers Using Guided Masking Ivana Beňová Jana Kosecka Michal Gregor Martin Tamajka Marcel Veselý Marián Simko 21 1 0 29 Jan 2024
Evaluating Bias and Fairness in Gender-Neutral Pretrained Vision-and-Language Models Laura Cabello Emanuele Bugliarello Stephanie Brandl Desmond Elliott 20 7 0 26 Oct 2023
The BLA Benchmark: Investigating Basic Language Abilities of Pre-Trained Multimodal Models Xinyi Chen Raquel Fernández Sandro Pezzelle VLM 13 9 0 23 Oct 2023
On the Language Encoder of Contrastive Cross-modal Models Mengjie Zhao Junya Ono Zhi-Wei Zhong Chieh-Hsin Lai Yuhta Takida Naoki Murata Wei-Hsiang Liao Takashi Shibuya Hiromi Wakaki Yuki Mitsufuji VLM 20 0 0 20 Oct 2023
A Survey on Image-text Multimodal Models Ruifeng Guo Jingxuan Wei Linzhuang Sun Khai Le-Duc Guiyong Chang Dawei Liu Sibo Zhang Zhengbing Yao Mingjun Xu Liping Bu VLM 21 5 0 23 Sep 2023
The Scenario Refiner: Grounding subjects in images at the morphological level Claudia Tagliaferri Sofia Axioti Albert Gatt Denis Paperno 16 1 0 20 Sep 2023
Can Linguistic Knowledge Improve Multimodal Alignment in Vision-Language Pretraining? Fei-Yue Wang Liang Ding Jun Rao Ye Liu Li Shen Changxing Ding 20 15 0 24 Aug 2023
Generic Attention-model Explainability by Weighted Relevance Accumulation Yiming Huang Ao Jia Xiaodan Zhang Jiawei Zhang 13 1 0 20 Aug 2023
Vision Language Transformers: A Survey Clayton Fields C. Kennington VLM 10 5 0 06 Jul 2023
Switch-BERT: Learning to Model Multimodal Interactions by Switching Attention and Input Qingpei Guo Kaisheng Yao Wei Chu MLLM 20 4 0 25 Jun 2023
Zero-shot Composed Text-Image Retrieval Yikun Liu Jiangchao Yao Ya-Qin Zhang Yanfeng Wang Weidi Xie 19 20 0 12 Jun 2023
Factorized Contrastive Learning: Going Beyond Multi-view Redundancy Paul Pu Liang Zihao Deng Martin Q. Ma James Y. Zou Louis-Philippe Morency Ruslan Salakhutdinov SSL 16 49 0 08 Jun 2023
Weakly-Supervised Learning of Visual Relations in Multimodal Pretraining Emanuele Bugliarello Aida Nematzadeh Lisa Anne Hendricks SSL 14 5 0 23 May 2023
Semantic Composition in Visually Grounded Language Models Rohan Pandey CoGe 11 1 0 15 May 2023
Measuring Progress in Fine-grained Vision-and-Language Understanding Emanuele Bugliarello Laurent Sartran Aishwarya Agrawal Lisa Anne Hendricks Aida Nematzadeh VLM 20 22 0 12 May 2023
A Multi-Modal Context Reasoning Approach for Conditional Inference on Joint Textual and Visual Clues Yunxin Li Baotian Hu Xinyu Chen Yuxin Ding Lin Ma Min Zhang LRM 35 14 0 08 May 2023
Multimodal Understanding Through Correlation Maximization and Minimization Yi Shi Marc Niethammer 30 0 0 04 May 2023
3D Feature Prediction for Masked-AutoEncoder-Based Point Cloud Pretraining Siming Yan Yu-Qi Yang Yu-Xiao Guo Hao Pan Peng-shuai Wang Xin Tong Yang Liu Qi-Xing Huang 3DPC 24 15 0 14 Apr 2023
A Study of Autoregressive Decoders for Multi-Tasking in Computer Vision Lucas Beyer Bo Wan Gagan Madan Filip Pavetić Andreas Steiner ... Emanuele Bugliarello Xiao Wang Qihang Yu Liang-Chieh Chen Xiaohua Zhai 37 8 0 30 Mar 2023
A Two-Sided Discussion of Preregistration of NLP Research Anders Søgaard Daniel Hershcovich Miryam de Lhoneux OnRL AI4CE 13 3 0 20 Feb 2023
BinaryVQA: A Versatile Test Set to Evaluate the Out-of-Distribution Generalization of VQA Models Ali Borji CoGe 8 1 0 28 Jan 2023
Multimodal Inverse Cloze Task for Knowledge-based Visual Question Answering Paul Lerner O. Ferret C. Guinaudeau 11 9 0 11 Jan 2023
Cross-modal Attention Congruence Regularization for Vision-Language Relation Alignment Rohan Pandey Rulin Shao Paul Pu Liang Ruslan Salakhutdinov Louis-Philippe Morency 16 12 0 20 Dec 2022
Compound Tokens: Channel Fusion for Vision-Language Representation Learning Maxwell Mbabilla Aladago A. Piergiovanni 11 1 0 02 Dec 2022
Understanding Cross-modal Interactions in V&L Models that Generate Scene Descriptions Michele Cafagna Kees van Deemter Albert Gatt CoGe 10 3 0 09 Nov 2022
Training Vision-Language Models with Less Bimodal Supervision Elad Segal Ben Bogin Jonathan Berant VLM 19 2 0 01 Nov 2022
Learning by Hallucinating: Vision-Language Pre-training with Weak Supervision T. Wang Jorma T. Laaksonen T. Langer Heikki Arponen Tom E. Bishop VLM 11 6 0 24 Oct 2022
Multilingual Multimodal Learning with Machine Translated Text Chen Qiu Dan Oneaţă Emanuele Bugliarello Stella Frank Desmond Elliott 38 13 0 24 Oct 2022
Do Vision-and-Language Transformers Learn Grounded Predicate-Noun Dependencies? Mitja Nikolaus Emmanuelle Salin Stéphane Ayache Abdellah Fourtassi Benoit Favre 8 13 0 21 Oct 2022
LVP-M3: Language-aware Visual Prompt for Multilingual Multimodal Machine Translation Hongcheng Guo Jiaheng Liu Haoyang Huang Jian Yang Zhoujun Li Dongdong Zhang Zheng Cui Furu Wei 37 22 0 19 Oct 2022
One does not fit all! On the Complementarity of Vision Encoders for Vision and Language Tasks Gregor Geigle Chen Cecilia Liu Jonas Pfeiffer Iryna Gurevych VLM 15 1 0 12 Oct 2022
How to Adapt Pre-trained Vision-and-Language Models to a Text-only Input? Lovisa Hagström Richard Johansson VLM 19 4 0 19 Sep 2022
FashionViL: Fashion-Focused Vision-and-Language Representation Learning Xiaoping Han Licheng Yu Xiatian Zhu Li Zhang Yi-Zhe Song Tao Xiang AI4TS 16 49 0 17 Jul 2022
Reassessing Evaluation Practices in Visual Question Answering: A Case Study on Out-of-Distribution Generalization Aishwarya Agrawal Ivana Kajić Emanuele Bugliarello Elnaz Davoodi Anita Gergely Phil Blunsom Aida Nematzadeh OOD 38 17 0 24 May 2022
Visual Spatial Reasoning Fangyu Liu Guy Edward Toh Emerson Nigel Collier ReLM 21 155 0 30 Apr 2022
Can Foundation Models Perform Zero-Shot Task Specification For Robot Manipulation? Yuchen Cui S. Niekum Abhi Gupta Vikash Kumar Aravind Rajeswaran LM&Ro 11 72 0 23 Apr 2022