Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1705.01359
Cited By
FOIL it! Find One mismatch between Image and Language caption
3 May 2017
Ravi Shekhar
Sandro Pezzelle
Yauhen Klimovich
Aurélie Herbelot
Moin Nabi
E. Sangineto
Raffaella Bernardi
Re-assign community
ArXiv
PDF
HTML
Papers citing
"FOIL it! Find One mismatch between Image and Language caption"
18 / 18 papers shown
Title
TULIP: Towards Unified Language-Image Pretraining
Zineng Tang
Long Lian
Seun Eisape
Xudong Wang
Roei Herzig
Adam Yala
Alane Suhr
Trevor Darrell
David M. Chan
VLM
CLIP
MLLM
95
3
0
19 Mar 2025
MASS: Overcoming Language Bias in Image-Text Matching
Jiwan Chung
Seungwon Lim
Sangkyu Lee
Youngjae Yu
VLM
30
0
0
20 Jan 2025
What Do VLMs NOTICE? A Mechanistic Interpretability Pipeline for Gaussian-Noise-free Text-Image Corruption and Evaluation
Michal Golovanevsky
William Rudman
Vedant Palit
Ritambhara Singh
Carsten Eickhoff
31
1
0
24 Jun 2024
Don't Buy it! Reassessing the Ad Understanding Abilities of Contrastive Multimodal Models
A. Bavaresco
A. Testoni
Raquel Fernández
23
2
0
31 May 2024
Do Vision & Language Decoders use Images and Text equally? How Self-consistent are their Explanations?
Letitia Parcalabescu
Anette Frank
MLLM
CoGe
VLM
82
3
0
29 Apr 2024
Polos: Multimodal Metric Learning from Human Feedback for Image Captioning
Yuiga Wada
Kanta Kaneda
Daichi Saito
Komei Sugiura
29
24
0
28 Feb 2024
Scalable Performance Analysis for Vision-Language Models
Santiago Castro
Oana Ignat
Rada Mihalcea
VLM
19
1
0
30 May 2023
An Examination of the Robustness of Reference-Free Image Captioning Evaluation Metrics
Saba Ahmadi
Aishwarya Agrawal
17
6
0
24 May 2023
Mutual Information Divergence: A Unified Metric for Multimodal Generative Models
Jin-Hwa Kim
Yunji Kim
Jiyoung Lee
Kang Min Yoo
Sang-Woo Lee
EGVM
19
32
0
25 May 2022
Can Visual Dialogue Models Do Scorekeeping? Exploring How Dialogue Representations Incrementally Encode Shared Knowledge
Brielen Madureira
David Schlangen
17
4
0
14 Apr 2022
CLIP-Event: Connecting Text and Images with Event Structures
Manling Li
Ruochen Xu
Shuohang Wang
Luowei Zhou
Xudong Lin
Chenguang Zhu
Michael Zeng
Heng Ji
Shih-Fu Chang
VLM
CLIP
10
123
0
13 Jan 2022
EMScore: Evaluating Video Captioning via Coarse-Grained and Fine-Grained Embedding Matching
Yaya Shi
Xu Yang
Haiyang Xu
Chunfen Yuan
Bing Li
Weiming Hu
Zhengjun Zha
31
33
0
17 Nov 2021
QACE: Asking Questions to Evaluate an Image Caption
Hwanhee Lee
Thomas Scialom
Seunghyun Yoon
Franck Dernoncourt
Kyomin Jung
CoGe
6
18
0
28 Aug 2021
ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks
Jiasen Lu
Dhruv Batra
Devi Parikh
Stefan Lee
SSL
VLM
19
3,610
0
06 Aug 2019
Evaluating the Representational Hub of Language and Vision Models
Ravi Shekhar
Ece Takmaz
Raquel Fernández
Raffaella Bernardi
17
11
0
12 Apr 2019
Pre-gen metrics: Predicting caption quality metrics without generating captions
Marc Tanti
Albert Gatt
K. Camilleri
14
2
0
12 Oct 2018
How clever is the FiLM model, and how clever can it be?
A. Kuhnle
Huiyuan Xie
Ann A. Copestake
16
6
0
09 Sep 2018
Targeted Syntactic Evaluation of Language Models
Rebecca Marvin
Tal Linzen
18
406
0
27 Aug 2018
1