158
v1v2v3 (latest)

MultiCheck: Strengthening Web Trust with Unified Multimodal Fact Verification

Main:8 Pages
8 Figures
Bibliography:2 Pages
17 Tables
Appendix:1 Pages
Abstract

Misinformation on the web increasingly appears in multimodal forms, combining text, images, and OCR-rendered content in ways that amplify harm to public trust and vulnerable communities. While prior fact-checking systems often rely on unimodal signals or shallow fusion strategies, modern misinformation campaigns operate across modalities and require models that can reason over subtle cross-modal inconsistencies in a transparent and responsible manner. We introduce MultiCheck, a lightweight and interpretable framework for multimodal fact verification that jointly analyzes textual, visual, and OCR evidence. At its core, MultiCheck employs a relational fusion module based on element-wise difference and product operations, allowing for explicit cross-modal interaction modeling with minimal computational overhead. A contrastive alignment objective further helps the model distinguish between supporting and refuting evidence while maintaining a small memory and energy footprint, making it suitable for low-resource deployment. Evaluated on the Factify-2 (5-class) and Mocheg (3-class) benchmarks, MultiCheck achieves huge performance improvement and remains robust under noisy OCR and missing modality conditions. Its efficiency, transparency, and real-world robustness make it well-suited for journalists, civil society organisations, and web integrity efforts working to build a safer and more trustworthy web.

View on arXiv
Comments on this paper