Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2012.12352
Cited By
v1
v2
v3
v4 (latest)
Seeing past words: Testing the cross-modal capabilities of pretrained V&L models on counting tasks
22 December 2020
Letitia Parcalabescu
Albert Gatt
Anette Frank
Iacer Calixto
LRM
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Seeing past words: Testing the cross-modal capabilities of pretrained V&L models on counting tasks"
22 / 22 papers shown
What's Missing in Vision-Language Models? Probing Their Struggles with Causal Order Reasoning
Zhaotian Weng
Haoxuan Li
Kuan-Hao Huang
Jieyu Zhao
Jieyu Zhao
LRM
CoGe
253
0
0
01 Jun 2025
VAQUUM: Are Vague Quantifiers Grounded in Visual Data?
Annual Meeting of the Association for Computational Linguistics (ACL), 2025
Hugh Mee Wong
Rick Nouwen
Albert Gatt
521
3
0
17 Feb 2025
CV-Probes: Studying the interplay of lexical and world knowledge in visually grounded verb understanding
Ivana Beňová
Michal Gregor
Albert Gatt
424
1
0
02 Sep 2024
What Do VLMs NOTICE? A Mechanistic Interpretability Pipeline for Gaussian-Noise-free Text-Image Corruption and Evaluation
Michal Golovanevsky
William Rudman
Vedant Palit
Ritambhara Singh
Carsten Eickhoff
495
14
0
24 Jun 2024
ColorFoil: Investigating Color Blindness in Large Vision and Language Models
Ahnaf Mozib Samin
M. F. Ahmed
Md. Mushtaq Shahriyar Rafee
VLM
380
8
0
19 May 2024
Beyond Image-Text Matching: Verb Understanding in Multimodal Transformers Using Guided Masking
Ivana Beňová
Jana Kosecka
Michal Gregor
Martin Tamajka
Marcel Veselý
Marian Simko
235
2
0
29 Jan 2024
The Role of Linguistic Priors in Measuring Compositional Generalization of Vision-Language Models
Chenwei Wu
Erran L. Li
Stefano Ermon
Patrick Haffner
Rong Ge
Zaiwei Zhang
VLM
CoGe
312
3
0
04 Oct 2023
Can Linguistic Knowledge Improve Multimodal Alignment in Vision-Language Pretraining?
Haiwei Yang
Liang Ding
Jun Rao
Ye Liu
Li Shen
Changxing Ding
308
26
0
24 Aug 2023
Controlling for Stereotypes in Multimodal Language Model Evaluation
BlackboxNLP Workshop on Analyzing and Interpreting Neural Networks for NLP (BlackboxNLP), 2023
Manuj Malik
Richard Johansson
352
1
0
03 Feb 2023
MM-SHAP: A Performance-agnostic Metric for Measuring Multimodal Contributions in Vision and Language Models & Tasks
Annual Meeting of the Association for Computational Linguistics (ACL), 2022
Letitia Parcalabescu
Anette Frank
260
56
0
15 Dec 2022
Do Vision-and-Language Transformers Learn Grounded Predicate-Noun Dependencies?
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2022
Mitja Nikolaus
Emmanuelle Salin
Stéphane Ayache
Abdellah Fourtassi
Benoit Favre
177
17
0
21 Oct 2022
Probing Cross-modal Semantics Alignment Capability from the Textual Perspective
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2022
Zheng Ma
Shi Zong
Mianzhi Pan
Jianbing Zhang
Shujian Huang
Xinyu Dai
Jiajun Chen
227
5
0
18 Oct 2022
When and why vision-language models behave like bags-of-words, and what to do about it?
International Conference on Learning Representations (ICLR), 2022
Mert Yuksekgonul
Federico Bianchi
Pratyusha Kalluri
Dan Jurafsky
James Zou
VLM
CoGe
562
574
0
04 Oct 2022
Winoground: Probing Vision and Language Models for Visio-Linguistic Compositionality
Computer Vision and Pattern Recognition (CVPR), 2022
Tristan Thrush
Ryan Jiang
Max Bartolo
Amanpreet Singh
Adina Williams
Douwe Kiela
Candace Ross
CoGe
447
557
0
07 Apr 2022
On Explaining Multimodal Hateful Meme Detection Models
The Web Conference (WWW), 2022
Ming Shan Hee
Roy Ka-wei Lee
Wen-Haw Chong
VLM
320
55
0
04 Apr 2022
Finding Structural Knowledge in Multimodal-BERT
Annual Meeting of the Association for Computational Linguistics (ACL), 2022
Victor Milewski
Miryam de Lhoneux
Marie-Francine Moens
315
12
0
17 Mar 2022
DIME: Fine-grained Interpretations of Multimodal Models via Disentangled Local Explanations
AAAI/ACM Conference on AI, Ethics, and Society (AIES), 2022
Yiwei Lyu
Paul Pu Liang
Zihao Deng
Ruslan Salakhutdinov
Louis-Philippe Morency
284
53
0
03 Mar 2022
VALSE: A Task-Independent Benchmark for Vision and Language Models Centered on Linguistic Phenomena
Letitia Parcalabescu
Michele Cafagna
Lilitta Muradjan
Anette Frank
Iacer Calixto
Albert Gatt
CoGe
345
140
0
14 Dec 2021
TraVLR: Now You See It, Now You Don't! A Bimodal Dataset for Evaluating Visio-Linguistic Reasoning
Conference of the European Chapter of the Association for Computational Linguistics (EACL), 2021
Keng Ji Chow
Samson Tan
MingSung Kan
LRM
298
5
0
21 Nov 2021
Recent Advances of Continual Learning in Computer Vision: An Overview
IET Computer Vision (ICV), 2021
Haoxuan Qu
Hossein Rahmani
Kepeng Xu
Bryan M. Williams
Jun Liu
VLM
CLL
612
103
0
23 Sep 2021
What Vision-Language Models `See' when they See Scenes
Michele Cafagna
Kees van Deemter
Albert Gatt
VLM
327
13
0
15 Sep 2021
Vision-and-Language or Vision-for-Language? On Cross-Modal Influence in Multimodal Transformers
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2021
Stella Frank
Emanuele Bugliarello
Desmond Elliott
235
97
0
09 Sep 2021
1
Page 1 of 1