Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2011.15124
Cited By
v1
v2 (latest)
Multimodal Pretraining Unmasked: A Meta-Analysis and a Unified Framework of Vision-and-Language BERTs
Transactions of the Association for Computational Linguistics (TACL), 2020
30 November 2020
Emanuele Bugliarello
Robert Bamler
Naoaki Okazaki
Desmond Elliott
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Multimodal Pretraining Unmasked: A Meta-Analysis and a Unified Framework of Vision-and-Language BERTs"
50 / 69 papers shown
Model Decides How to Tokenize: Adaptive DNA Sequence Tokenization with MxDNA
Neural Information Processing Systems (NeurIPS), 2024
Lifeng Qiao
Peng Ye
Yuchen Ren
Weiqiang Bai
Chaoqi Liang
Cheng Wang
Nanqing Dong
W. Ouyang
366
16
0
18 Dec 2024
Do Language Models Understand Time?
The Web Conference (WWW), 2024
Xi Ding
Lei Wang
1.0K
13
0
18 Dec 2024
Unified Framework for Open-World Compositional Zero-shot Learning
IEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2024
Hirunima Jayasekara
Khoi Pham
Nirat Saini
Abhinav Shrivastava
386
1
0
05 Dec 2024
Renaissance: Investigating the Pretraining of Vision-Language Encoders
Clayton Fields
C. Kennington
VLM
221
1
0
11 Nov 2024
VISTA: A Visual and Textual Attention Dataset for Interpreting Multimodal Models
Harshit
Tolga Tasdizen
CoGe
VLM
200
2
0
06 Oct 2024
Why context matters in VQA and Reasoning: Semantic interventions for VLM input modalities
Kenza Amara
Lukas Klein
Carsten T. Lüth
Paul Jäger
Hendrik Strobelt
Mennatallah El-Assady
228
3
0
02 Oct 2024
ComAlign: Compositional Alignment in Vision-Language Models
Ali Abdollah
Amirmohammad Izadi
Armin Saghafian
Reza Vahidimajd
Mohammad Mozafari
Amirreza Mirzaei
Mohammadmahdi Samiei
M. Baghshah
CoGe
VLM
296
1
0
12 Sep 2024
CV-Probes: Studying the interplay of lexical and world knowledge in visually grounded verb understanding
Ivana Beňová
Michal Gregor
Albert Gatt
433
1
0
02 Sep 2024
BrewCLIP: A Bifurcated Representation Learning Framework for Audio-Visual Retrieval
Zhenyu Lu
Lakshay Sethi
238
0
0
19 Aug 2024
MuTT: A Multimodal Trajectory Transformer for Robot Skills
Claudius Kienle
Benjamin Alt
Onur Celik
P. Becker
Darko Katic
Rainer Jäkel
Gerhard Neumann
371
3
0
22 Jul 2024
How and where does CLIP process negation?
Vincent Quantmeyer
Pablo Mosteiro
Albert Gatt
CoGe
306
14
0
15 Jul 2024
GECOBench: A Gender-Controlled Text Dataset and Benchmark for Quantifying Biases in Explanations
Rick Wilming
Artur Dox
Hjalmar Schulz
Marta Oliveira
Benedict Clark
Stefan Haufe
338
6
0
17 Jun 2024
No Filter: Cultural and Socioeconomic Diversity in Contrastive Vision-Language Models
Angeline Pouget
Lucas Beyer
Emanuele Bugliarello
Xiao Wang
Andreas Steiner
Xiao-Qi Zhai
Ibrahim Alabdulmohsin
VLM
421
15
0
22 May 2024
Acquiring Linguistic Knowledge from Multimodal Input
Theodor Amariucai
Alexander Scott Warstadt
CLL
367
4
0
27 Feb 2024
Beyond Image-Text Matching: Verb Understanding in Multimodal Transformers Using Guided Masking
Ivana Beňová
Jana Kosecka
Michal Gregor
Martin Tamajka
Marcel Veselý
Marian Simko
240
2
0
29 Jan 2024
GPT-4 Enhanced Multimodal Grounding for Autonomous Driving: Leveraging Cross-Modal Attention with Large Language Models
Haicheng Liao
Huanming Shen
Zhenning Li
Chengyue Wang
Guofa Li
Yiming Bie
Chengzhong Xu
310
89
0
06 Dec 2023
Evaluating Bias and Fairness in Gender-Neutral Pretrained Vision-and-Language Models
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Laura Cabello
Emanuele Bugliarello
Stephanie Brandl
Desmond Elliott
350
8
0
26 Oct 2023
The BLA Benchmark: Investigating Basic Language Abilities of Pre-Trained Multimodal Models
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Xinyi Chen
Raquel Fernández
Sandro Pezzelle
VLM
263
13
0
23 Oct 2023
On the Language Encoder of Contrastive Cross-modal Models
Mengjie Zhao
Junya Ono
Zhi-Wei Zhong
Chieh-Hsin Lai
Yuhta Takida
Naoki Murata
Wei-Hsiang Liao
Takashi Shibuya
Hiromi Wakaki
Yuki Mitsufuji
VLM
169
2
0
20 Oct 2023
A Survey on Image-text Multimodal Models
Ruifeng Guo
Jingxuan Wei
Linzhuang Sun
Khai-Nguyen Nguyen
Guiyong Chang
Dawei Liu
Sibo Zhang
Zhengbing Yao
Mingjun Xu
Liping Bu
VLM
411
25
0
23 Sep 2023
The Scenario Refiner: Grounding subjects in images at the morphological level
Claudia Tagliaferri
Sofia Axioti
Albert Gatt
Denis Paperno
271
1
0
20 Sep 2023
Can Linguistic Knowledge Improve Multimodal Alignment in Vision-Language Pretraining?
Haiwei Yang
Liang Ding
Jun Rao
Ye Liu
Li Shen
Changxing Ding
318
27
0
24 Aug 2023
Generic Attention-model Explainability by Weighted Relevance Accumulation
ACM Multimedia Asia (MA), 2023
Yiming Huang
Ao Jia
Xiaodan Zhang
Jiawei Zhang
176
4
0
20 Aug 2023
Vision Language Transformers: A Survey
Clayton Fields
C. Kennington
VLM
237
8
0
06 Jul 2023
Switch-BERT: Learning to Model Multimodal Interactions by Switching Attention and Input
European Conference on Computer Vision (ECCV), 2023
Qingpei Guo
Kaisheng Yao
Wei Chu
MLLM
124
7
0
25 Jun 2023
Zero-shot Composed Text-Image Retrieval
British Machine Vision Conference (BMVC), 2023
Yikun Liu
Jiangchao Yao
Ya Zhang
Yanfeng Wang
Weidi Xie
315
38
0
12 Jun 2023
Factorized Contrastive Learning: Going Beyond Multi-view Redundancy
Neural Information Processing Systems (NeurIPS), 2023
Paul Pu Liang
Zihao Deng
Martin Q. Ma
James Zou
Louis-Philippe Morency
Ruslan Salakhutdinov
SSL
349
100
0
08 Jun 2023
Weakly-Supervised Learning of Visual Relations in Multimodal Pretraining
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Emanuele Bugliarello
Aida Nematzadeh
Lisa Anne Hendricks
SSL
344
6
0
23 May 2023
Semantic Composition in Visually Grounded Language Models
Rohan Pandey
CoGe
255
1
0
15 May 2023
Measuring Progress in Fine-grained Vision-and-Language Understanding
Annual Meeting of the Association for Computational Linguistics (ACL), 2023
Emanuele Bugliarello
Laurent Sartran
Aishwarya Agrawal
Lisa Anne Hendricks
Aida Nematzadeh
VLM
261
31
0
12 May 2023
A Multi-Modal Context Reasoning Approach for Conditional Inference on Joint Textual and Visual Clues
Annual Meeting of the Association for Computational Linguistics (ACL), 2023
Yunxin Li
Baotian Hu
Xinyu Chen
Yuxin Ding
Lin Ma
Min Zhang
LRM
216
19
0
08 May 2023
Multimodal Understanding Through Correlation Maximization and Minimization
Yi Shi
Marc Niethammer
246
1
0
04 May 2023
3D Feature Prediction for Masked-AutoEncoder-Based Point Cloud Pretraining
International Conference on Learning Representations (ICLR), 2023
Siming Yan
Yu-Qi Yang
Yu-Xiao Guo
Hao Pan
Peng-shuai Wang
Xin Tong
Yang Liu
Qi-Xing Huang
3DPC
285
21
0
14 Apr 2023
A Study of Autoregressive Decoders for Multi-Tasking in Computer Vision
Lucas Beyer
Bo Wan
Gagan Madan
Filip Pavetić
Andreas Steiner
...
Emanuele Bugliarello
Tianlin Li
Qihang Yu
Liang-Chieh Chen
Xiaohua Zhai
311
10
0
30 Mar 2023
A Two-Sided Discussion of Preregistration of NLP Research
Conference of the European Chapter of the Association for Computational Linguistics (EACL), 2023
Anders Søgaard
Daniel Hershcovich
Miryam de Lhoneux
OnRL
AI4CE
239
4
0
20 Feb 2023
BinaryVQA: A Versatile Test Set to Evaluate the Out-of-Distribution Generalization of VQA Models
Ali Borji
CoGe
153
2
0
28 Jan 2023
Multimodal Inverse Cloze Task for Knowledge-based Visual Question Answering
European Conference on Information Retrieval (ECIR), 2023
Paul Lerner
O. Ferret
C. Guinaudeau
303
12
0
11 Jan 2023
Cross-modal Attention Congruence Regularization for Vision-Language Relation Alignment
Annual Meeting of the Association for Computational Linguistics (ACL), 2022
Rohan Pandey
Rulin Shao
Paul Pu Liang
Ruslan Salakhutdinov
Louis-Philippe Morency
266
21
0
20 Dec 2022
Compound Tokens: Channel Fusion for Vision-Language Representation Learning
Maxwell Mbabilla Aladago
A. Piergiovanni
221
3
0
02 Dec 2022
Understanding Cross-modal Interactions in V&L Models that Generate Scene Descriptions
Michele Cafagna
Kees van Deemter
Albert Gatt
CoGe
200
4
0
09 Nov 2022
Training Vision-Language Models with Less Bimodal Supervision
Conference on Automated Knowledge Base Construction (AKBC), 2022
Elad Segal
Ben Bogin
Jonathan Berant
VLM
153
2
0
01 Nov 2022
Learning by Hallucinating: Vision-Language Pre-training with Weak Supervision
IEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2022
Tong Wang
Jorma T. Laaksonen
T. Langer
Heikki Arponen
Tom E. Bishop
VLM
208
7
0
24 Oct 2022
Multilingual Multimodal Learning with Machine Translated Text
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2022
Chen Qiu
Dan Oneaţă
Emanuele Bugliarello
Stella Frank
Desmond Elliott
394
19
0
24 Oct 2022
Do Vision-and-Language Transformers Learn Grounded Predicate-Noun Dependencies?
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2022
Mitja Nikolaus
Emmanuelle Salin
Stéphane Ayache
Abdellah Fourtassi
Benoit Favre
179
17
0
21 Oct 2022
LVP-M3: Language-aware Visual Prompt for Multilingual Multimodal Machine Translation
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2022
Hongcheng Guo
Jiaheng Liu
Haoyang Huang
Jian Yang
Zhoujun Li
Dongdong Zhang
Zheng Cui
Furu Wei
230
25
0
19 Oct 2022
One does not fit all! On the Complementarity of Vision Encoders for Vision and Language Tasks
Workshop on Representation Learning for NLP (RepL4NLP), 2022
Gregor Geigle
Chen Cecilia Liu
Jonas Pfeiffer
Iryna Gurevych
VLM
235
1
0
12 Oct 2022
How to Adapt Pre-trained Vision-and-Language Models to a Text-only Input?
International Conference on Computational Linguistics (COLING), 2022
Lovisa Hagström
Richard Johansson
VLM
172
4
0
19 Sep 2022
FashionViL: Fashion-Focused Vision-and-Language Representation Learning
European Conference on Computer Vision (ECCV), 2022
Xiaoping Han
Licheng Yu
Xiatian Zhu
Li Zhang
Yi-Zhe Song
Tao Xiang
AI4TS
226
63
0
17 Jul 2022
Reassessing Evaluation Practices in Visual Question Answering: A Case Study on Out-of-Distribution Generalization
Findings (Findings), 2022
Aishwarya Agrawal
Ivana Kajić
Emanuele Bugliarello
Elnaz Davoodi
Anita Gergely
Phil Blunsom
Aida Nematzadeh
OOD
281
22
0
24 May 2022
Visual Spatial Reasoning
Transactions of the Association for Computational Linguistics (TACL), 2022
Fangyu Liu
Guy Edward Toh Emerson
Nigel Collier
ReLM
629
301
0
30 Apr 2022
1
2
Next
Page 1 of 2