Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2011.15124
Cited By
Multimodal Pretraining Unmasked: A Meta-Analysis and a Unified Framework of Vision-and-Language BERTs
30 November 2020
Emanuele Bugliarello
Ryan Cotterell
Naoaki Okazaki
Desmond Elliott
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Multimodal Pretraining Unmasked: A Meta-Analysis and a Unified Framework of Vision-and-Language BERTs"
50 / 72 papers shown
Title
Model Decides How to Tokenize: Adaptive DNA Sequence Tokenization with MxDNA
Lifeng Qiao
Peng Ye
Yuchen Ren
Weiqiang Bai
Chaoqi Liang
Xinzhu Ma
Nanqing Dong
W. Ouyang
71
2
0
18 Dec 2024
Do Language Models Understand Time?
Xi Ding
Lei Wang
158
0
0
18 Dec 2024
Unified Framework for Open-World Compositional Zero-shot Learning
Hirunima Jayasekara
Khoi Pham
Nirat Saini
Abhinav Shrivastava
56
0
0
05 Dec 2024
Renaissance: Investigating the Pretraining of Vision-Language Encoders
Clayton Fields
C. Kennington
VLM
14
0
0
11 Nov 2024
VISTA: A Visual and Textual Attention Dataset for Interpreting Multimodal Models
Harshit
Tolga Tasdizen
CoGe
VLM
17
1
0
06 Oct 2024
Why context matters in VQA and Reasoning: Semantic interventions for VLM input modalities
Kenza Amara
Lukas Klein
Carsten T. Lüth
Paul Jäger
Hendrik Strobelt
Mennatallah El-Assady
25
1
0
02 Oct 2024
ComAlign: Compositional Alignment in Vision-Language Models
Ali Abdollah
Amirmohammad Izadi
Armin Saghafian
Reza Vahidimajd
Mohammad Mozafari
Amirreza Mirzaei
Mohammadmahdi Samiei
M. Baghshah
CoGe
VLM
22
0
0
12 Sep 2024
CV-Probes: Studying the interplay of lexical and world knowledge in visually grounded verb understanding
Ivana Beňová
Michal Gregor
Albert Gatt
24
0
0
02 Sep 2024
BrewCLIP: A Bifurcated Representation Learning Framework for Audio-Visual Retrieval
Zhenyu Lu
Lakshay Sethi
38
0
0
19 Aug 2024
MuTT: A Multimodal Trajectory Transformer for Robot Skills
Claudius Kienle
Benjamin Alt
Onur Celik
P. Becker
Darko Katic
Rainer Jäkel
Gerhard Neumann
25
2
0
22 Jul 2024
How and where does CLIP process negation?
Vincent Quantmeyer
Pablo Mosteiro
Albert Gatt
CoGe
19
6
0
15 Jul 2024
GECOBench: A Gender-Controlled Text Dataset and Benchmark for Quantifying Biases in Explanations
Rick Wilming
Artur Dox
Hjalmar Schulz
Marta Oliveira
Benedict Clark
Stefan Haufe
29
2
0
17 Jun 2024
No Filter: Cultural and Socioeconomic Diversity in Contrastive Vision-Language Models
Angeline Pouget
Lucas Beyer
Emanuele Bugliarello
Xiao Wang
Andreas Steiner
Xiao-Qi Zhai
Ibrahim M. Alabdulmohsin
VLM
31
7
0
22 May 2024
Acquiring Linguistic Knowledge from Multimodal Input
Theodor Amariucai
Alexander Scott Warstadt
CLL
16
2
0
27 Feb 2024
Beyond Image-Text Matching: Verb Understanding in Multimodal Transformers Using Guided Masking
Ivana Beňová
Jana Kosecka
Michal Gregor
Martin Tamajka
Marcel Veselý
Marián Simko
21
1
0
29 Jan 2024
Evaluating Bias and Fairness in Gender-Neutral Pretrained Vision-and-Language Models
Laura Cabello
Emanuele Bugliarello
Stephanie Brandl
Desmond Elliott
16
7
0
26 Oct 2023
The BLA Benchmark: Investigating Basic Language Abilities of Pre-Trained Multimodal Models
Xinyi Chen
Raquel Fernández
Sandro Pezzelle
VLM
13
9
0
23 Oct 2023
On the Language Encoder of Contrastive Cross-modal Models
Mengjie Zhao
Junya Ono
Zhi-Wei Zhong
Chieh-Hsin Lai
Yuhta Takida
Naoki Murata
Wei-Hsiang Liao
Takashi Shibuya
Hiromi Wakaki
Yuki Mitsufuji
VLM
18
0
0
20 Oct 2023
A Survey on Image-text Multimodal Models
Ruifeng Guo
Jingxuan Wei
Linzhuang Sun
Khai Le-Duc
Guiyong Chang
Dawei Liu
Sibo Zhang
Zhengbing Yao
Mingjun Xu
Liping Bu
VLM
21
5
0
23 Sep 2023
The Scenario Refiner: Grounding subjects in images at the morphological level
Claudia Tagliaferri
Sofia Axioti
Albert Gatt
Denis Paperno
14
1
0
20 Sep 2023
Can Linguistic Knowledge Improve Multimodal Alignment in Vision-Language Pretraining?
Fei-Yue Wang
Liang Ding
Jun Rao
Ye Liu
Li Shen
Changxing Ding
20
15
0
24 Aug 2023
Generic Attention-model Explainability by Weighted Relevance Accumulation
Yiming Huang
Ao Jia
Xiaodan Zhang
Jiawei Zhang
13
1
0
20 Aug 2023
Vision Language Transformers: A Survey
Clayton Fields
C. Kennington
VLM
10
5
0
06 Jul 2023
Switch-BERT: Learning to Model Multimodal Interactions by Switching Attention and Input
Qingpei Guo
Kaisheng Yao
Wei Chu
MLLM
20
4
0
25 Jun 2023
Zero-shot Composed Text-Image Retrieval
Yikun Liu
Jiangchao Yao
Ya-Qin Zhang
Yanfeng Wang
Weidi Xie
19
20
0
12 Jun 2023
Factorized Contrastive Learning: Going Beyond Multi-view Redundancy
Paul Pu Liang
Zihao Deng
Martin Q. Ma
James Y. Zou
Louis-Philippe Morency
Ruslan Salakhutdinov
SSL
16
49
0
08 Jun 2023
Weakly-Supervised Learning of Visual Relations in Multimodal Pretraining
Emanuele Bugliarello
Aida Nematzadeh
Lisa Anne Hendricks
SSL
14
5
0
23 May 2023
Semantic Composition in Visually Grounded Language Models
Rohan Pandey
CoGe
11
1
0
15 May 2023
Measuring Progress in Fine-grained Vision-and-Language Understanding
Emanuele Bugliarello
Laurent Sartran
Aishwarya Agrawal
Lisa Anne Hendricks
Aida Nematzadeh
VLM
20
22
0
12 May 2023
A Multi-Modal Context Reasoning Approach for Conditional Inference on Joint Textual and Visual Clues
Yunxin Li
Baotian Hu
Xinyu Chen
Yuxin Ding
Lin Ma
Min Zhang
LRM
35
14
0
08 May 2023
Multimodal Understanding Through Correlation Maximization and Minimization
Yi Shi
Marc Niethammer
30
0
0
04 May 2023
3D Feature Prediction for Masked-AutoEncoder-Based Point Cloud Pretraining
Siming Yan
Yu-Qi Yang
Yu-Xiao Guo
Hao Pan
Peng-shuai Wang
Xin Tong
Yang Liu
Qi-Xing Huang
3DPC
22
15
0
14 Apr 2023
A Study of Autoregressive Decoders for Multi-Tasking in Computer Vision
Lucas Beyer
Bo Wan
Gagan Madan
Filip Pavetić
Andreas Steiner
...
Emanuele Bugliarello
Xiao Wang
Qihang Yu
Liang-Chieh Chen
Xiaohua Zhai
35
8
0
30 Mar 2023
A Two-Sided Discussion of Preregistration of NLP Research
Anders Søgaard
Daniel Hershcovich
Miryam de Lhoneux
OnRL
AI4CE
10
3
0
20 Feb 2023
BinaryVQA: A Versatile Test Set to Evaluate the Out-of-Distribution Generalization of VQA Models
Ali Borji
CoGe
8
1
0
28 Jan 2023
Multimodal Inverse Cloze Task for Knowledge-based Visual Question Answering
Paul Lerner
O. Ferret
C. Guinaudeau
11
9
0
11 Jan 2023
Cross-modal Attention Congruence Regularization for Vision-Language Relation Alignment
Rohan Pandey
Rulin Shao
Paul Pu Liang
Ruslan Salakhutdinov
Louis-Philippe Morency
10
12
0
20 Dec 2022
Compound Tokens: Channel Fusion for Vision-Language Representation Learning
Maxwell Mbabilla Aladago
A. Piergiovanni
11
1
0
02 Dec 2022
Understanding Cross-modal Interactions in V&L Models that Generate Scene Descriptions
Michele Cafagna
Kees van Deemter
Albert Gatt
CoGe
10
3
0
09 Nov 2022
Training Vision-Language Models with Less Bimodal Supervision
Elad Segal
Ben Bogin
Jonathan Berant
VLM
19
2
0
01 Nov 2022
Learning by Hallucinating: Vision-Language Pre-training with Weak Supervision
T. Wang
Jorma T. Laaksonen
T. Langer
Heikki Arponen
Tom E. Bishop
VLM
11
6
0
24 Oct 2022
Multilingual Multimodal Learning with Machine Translated Text
Chen Qiu
Dan Oneaţă
Emanuele Bugliarello
Stella Frank
Desmond Elliott
38
13
0
24 Oct 2022
Do Vision-and-Language Transformers Learn Grounded Predicate-Noun Dependencies?
Mitja Nikolaus
Emmanuelle Salin
Stéphane Ayache
Abdellah Fourtassi
Benoit Favre
6
13
0
21 Oct 2022
LVP-M3: Language-aware Visual Prompt for Multilingual Multimodal Machine Translation
Hongcheng Guo
Jiaheng Liu
Haoyang Huang
Jian Yang
Zhoujun Li
Dongdong Zhang
Zheng Cui
Furu Wei
37
22
0
19 Oct 2022
One does not fit all! On the Complementarity of Vision Encoders for Vision and Language Tasks
Gregor Geigle
Chen Cecilia Liu
Jonas Pfeiffer
Iryna Gurevych
VLM
13
1
0
12 Oct 2022
How to Adapt Pre-trained Vision-and-Language Models to a Text-only Input?
Lovisa Hagström
Richard Johansson
VLM
19
4
0
19 Sep 2022
FashionViL: Fashion-Focused Vision-and-Language Representation Learning
Xiaoping Han
Licheng Yu
Xiatian Zhu
Li Zhang
Yi-Zhe Song
Tao Xiang
AI4TS
8
49
0
17 Jul 2022
Reassessing Evaluation Practices in Visual Question Answering: A Case Study on Out-of-Distribution Generalization
Aishwarya Agrawal
Ivana Kajić
Emanuele Bugliarello
Elnaz Davoodi
Anita Gergely
Phil Blunsom
Aida Nematzadeh
OOD
38
17
0
24 May 2022
Visual Spatial Reasoning
Fangyu Liu
Guy Edward Toh Emerson
Nigel Collier
ReLM
21
155
0
30 Apr 2022
Can Foundation Models Perform Zero-Shot Task Specification For Robot Manipulation?
Yuchen Cui
S. Niekum
Abhi Gupta
Vikash Kumar
Aravind Rajeswaran
LM&Ro
11
72
0
23 Apr 2022
1
2
Next