Passive Deepfake Detection Across Multi-modalities: A Comprehensive Survey

In recent years, deepfakes (DFs) have been utilized for malicious purposes, such as individual impersonation, misinformation spreading, and artists style imitation, raising questions about ethical and security concerns. In this survey, we provide a comprehensive review and comparison of passive DF detection across multiple modalities, including image, video, audio, and multi-modal, to explore the inter-modality relationships between them. Beyond detection accuracy, we extend our analysis to encompass crucial performance dimensions essential for real-world deployment: generalization capabilities across novel generation techniques, robustness against adversarial manipulations and postprocessing techniques, attribution precision in identifying generation sources, and resilience under real-world operational conditions. Additionally, we analyze the advantages and limitations of existing datasets, benchmarks, and evaluation metrics for passive DF detection. Finally, we propose future research directions that address these unexplored and emerging issues in the field of passive DF detection. This survey offers researchers and practitioners a comprehensive resource for understanding the current landscape, methodological approaches, and promising future directions in this rapidly evolving field.
View on arXiv@article{nguyen-le2025_2411.17911, title={ Passive Deepfake Detection Across Multi-modalities: A Comprehensive Survey }, author={ Hong-Hanh Nguyen-Le and Van-Tuan Tran and Dinh-Thuc Nguyen and Nhien-An Le-Khac }, journal={arXiv preprint arXiv:2411.17911}, year={ 2025 } }