v1v2 (latest)

M4FC: a Multimodal, Multilingual, Multicultural, Multitask Real-World Fact-Checking Dataset

27 October 2025

Jiahui Geng

Jonathan Tonglet

Iryna Gurevych

ArXiv (abs)PDF HTML Github (3★)

Main:8 Pages

18 Figures

Bibliography:4 Pages

19 Tables

Appendix:14 Pages

Abstract

Existing real-world datasets for multimodal fact-checking have multiple limitations: they contain few instances, focus on only one or two languages and tasks, suffer from evidence leakage, or rely on external sets of news articles for sourcing true claims. To address these shortcomings, we introduce M4FC, a new real-world dataset comprising 4,982 images paired with 6,980 claims. The images, verified by professional fact-checkers from 22 organizations, represent a diverse range of cultural and geographic contexts. Each claim is available in one or two out of ten languages. M4FC spans six multimodal fact-checking tasks: visual claim extraction, claimant intent prediction, fake image detection, image contextualization, location verification, and verdict prediction. We provide baseline results for all tasks and analyze how combining intermediate tasks influences verdict prediction performance. We make our dataset and code available.

View on arXiv

Comments on this paper