41
0

Towards Explainable Fake Image Detection with Multi-Modal Large Language Models

Abstract

Progress in image generation raises significant public security concerns. We argue that fake image detection should not operate as a "black box". Instead, an ideal approach must ensure both strong generalization and transparency. Recent progress in Multi-modal Large Language Models (MLLMs) offers new opportunities for reasoning-based AI-generated image detection. In this work, we evaluate the capabilities of MLLMs in comparison to traditional detection methods and human evaluators, highlighting their strengths and limitations. Furthermore, we design six distinct prompts and propose a framework that integrates these prompts to develop a more robust, explainable, and reasoning-driven detection system. The code is available atthis https URL.

View on arXiv
@article{ji2025_2504.14245,
  title={ Towards Explainable Fake Image Detection with Multi-Modal Large Language Models },
  author={ Yikun Ji and Yan Hong and Jiahui Zhan and Haoxing Chen and jun lan and Huijia Zhu and Weiqiang Wang and Liqing Zhang and Jianfu Zhang },
  journal={arXiv preprint arXiv:2504.14245},
  year={ 2025 }
}
Comments on this paper