Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1705.01359
Cited By
FOIL it! Find One mismatch between Image and Language caption
3 May 2017
Ravi Shekhar
Sandro Pezzelle
Yauhen Klimovich
Aurélie Herbelot
Moin Nabi
E. Sangineto
Raffaella Bernardi
Re-assign community
ArXiv
PDF
HTML
Papers citing
"FOIL it! Find One mismatch between Image and Language caption"
21 / 21 papers shown
Title
TULIP: Towards Unified Language-Image Pretraining
Zineng Tang
Long Lian
Seun Eisape
Xudong Wang
Roei Herzig
Adam Yala
Alane Suhr
Trevor Darrell
David M. Chan
VLM
CLIP
MLLM
95
3
0
19 Mar 2025
MASS: Overcoming Language Bias in Image-Text Matching
Jiwan Chung
Seungwon Lim
Sangkyu Lee
Youngjae Yu
VLM
30
0
0
20 Jan 2025
What Do VLMs NOTICE? A Mechanistic Interpretability Pipeline for Gaussian-Noise-free Text-Image Corruption and Evaluation
Michal Golovanevsky
William Rudman
Vedant Palit
Ritambhara Singh
Carsten Eickhoff
31
1
0
24 Jun 2024
Don't Buy it! Reassessing the Ad Understanding Abilities of Contrastive Multimodal Models
A. Bavaresco
A. Testoni
Raquel Fernández
25
2
0
31 May 2024
Do Vision & Language Decoders use Images and Text equally? How Self-consistent are their Explanations?
Letitia Parcalabescu
Anette Frank
MLLM
CoGe
VLM
82
3
0
29 Apr 2024
Polos: Multimodal Metric Learning from Human Feedback for Image Captioning
Yuiga Wada
Kanta Kaneda
Daichi Saito
Komei Sugiura
34
24
0
28 Feb 2024
Scalable Performance Analysis for Vision-Language Models
Santiago Castro
Oana Ignat
Rada Mihalcea
VLM
19
1
0
30 May 2023
An Examination of the Robustness of Reference-Free Image Captioning Evaluation Metrics
Saba Ahmadi
Aishwarya Agrawal
17
6
0
24 May 2023
Mutual Information Divergence: A Unified Metric for Multimodal Generative Models
Jin-Hwa Kim
Yunji Kim
Jiyoung Lee
Kang Min Yoo
Sang-Woo Lee
EGVM
29
32
0
25 May 2022
Can Visual Dialogue Models Do Scorekeeping? Exploring How Dialogue Representations Incrementally Encode Shared Knowledge
Brielen Madureira
David Schlangen
17
4
0
14 Apr 2022
CLIP-Event: Connecting Text and Images with Event Structures
Manling Li
Ruochen Xu
Shuohang Wang
Luowei Zhou
Xudong Lin
Chenguang Zhu
Michael Zeng
Heng Ji
Shih-Fu Chang
VLM
CLIP
10
123
0
13 Jan 2022
EMScore: Evaluating Video Captioning via Coarse-Grained and Fine-Grained Embedding Matching
Yaya Shi
Xu Yang
Haiyang Xu
Chunfen Yuan
Bing Li
Weiming Hu
Zhengjun Zha
31
33
0
17 Nov 2021
QACE: Asking Questions to Evaluate an Image Caption
Hwanhee Lee
Thomas Scialom
Seunghyun Yoon
Franck Dernoncourt
Kyomin Jung
CoGe
8
18
0
28 Aug 2021
CLIPScore: A Reference-free Evaluation Metric for Image Captioning
Jack Hessel
Ari Holtzman
Maxwell Forbes
Ronan Le Bras
Yejin Choi
CLIP
11
1,434
0
18 Apr 2021
Probing Multimodal Embeddings for Linguistic Properties: the Visual-Semantic Case
Adam Dahlgren Lindström
Suna Bensch
Johanna Björklund
F. Drewes
14
20
0
22 Feb 2021
ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks
Jiasen Lu
Dhruv Batra
Devi Parikh
Stefan Lee
SSL
VLM
37
3,610
0
06 Aug 2019
Evaluating the Representational Hub of Language and Vision Models
Ravi Shekhar
Ece Takmaz
Raquel Fernández
Raffaella Bernardi
22
11
0
12 Apr 2019
Pre-gen metrics: Predicting caption quality metrics without generating captions
Marc Tanti
Albert Gatt
K. Camilleri
14
2
0
12 Oct 2018
How clever is the FiLM model, and how clever can it be?
A. Kuhnle
Huiyuan Xie
Ann A. Copestake
16
6
0
09 Sep 2018
Targeted Syntactic Evaluation of Language Models
Rebecca Marvin
Tal Linzen
23
406
0
27 Aug 2018
Defoiling Foiled Image Captions
Pranava Madhyastha
Josiah Wang
Lucia Specia
14
9
0
16 May 2018
1