UNITE-FND: Reframing Multimodal Fake News Detection through Unimodal Scene Translation

16 February 2025

Abstract

Multimodal fake news detection typically demands complex architectures and substantial computational resources, posing deployment challenges in real-world settings. We introduce UNITE-FND, a novel framework that reframes multimodal fake news detection as a unimodal text classification task. We propose six specialized prompting strategies with Gemini 1.5 Pro, converting visual content into structured textual descriptions, and enabling efficient text-only models to preserve critical visual information. To benchmark our approach, we introduce Uni-Fakeddit-55k, a curated dataset family of 55,000 samples each, each processed through our multimodal-to-unimodal translation framework. Experimental results demonstrate that UNITE-FND achieves 92.52% accuracy in binary classification, surpassing prior multimodal models while reducing computational costs by over 10x (TinyBERT variant: 14.5M parameters vs. 250M+ in SOTA models). Additionally, we propose a comprehensive suite of five novel metrics to evaluate image-to-text conversion quality, ensuring optimal information preservation. Our results demonstrate that structured text-based representations can replace direct multimodal processing with minimal loss of accuracy, making UNITE-FND a practical and scalable alternative for resource-constrained environments.

View on arXiv

@article{mukherjee2025_2502.11132,
  title={ UNITE-FND: Reframing Multimodal Fake News Detection through Unimodal Scene Translation },
  author={ Arka Mukherjee and Shreya Ghosh },
  journal={arXiv preprint arXiv:2502.11132},
  year={ 2025 }
}

Comments on this paper