Beyond Spatial Frequency: Pixel-wise Temporal Frequency-based Deepfake Video Detection

3 July 2025

Taehoon Kim

Jongwook Choi

Yonghyun Jeong

Haeun Noh

Jaejun Yoo

Seungryul Baek

Jongwon Choi

ArXiv (abs)PDF HTML

Main:12 Pages

13 Figures

Bibliography:3 Pages

15 Tables

Abstract

We introduce a deepfake video detection approach that exploits pixel-wise temporal inconsistencies, which traditional spatial frequency-based detectors often overlook. Traditional detectors represent temporal information merely by stacking spatial frequency spectra across frames, resulting in the failure to detect temporal artifacts in the pixel plane. Our approach performs a 1D Fourier transform on the time axis for each pixel, extracting features highly sensitive to temporal inconsistencies, especially in areas prone to unnatural movements. To precisely locate regions containing the temporal artifacts, we introduce an attention proposal module trained in an end-to-end manner. Additionally, our joint transformer module effectively integrates pixel-wise temporal frequency features with spatio-temporal context features, expanding the range of detectable forgery artifacts. Our framework represents a significant advancement in deepfake video detection, providing robust performance across diverse and challenging detection scenarios.

View on arXiv

@article{kim2025_2507.02398,
  title={ Beyond Spatial Frequency: Pixel-wise Temporal Frequency-based Deepfake Video Detection },
  author={ Taehoon Kim and Jongwook Choi and Yonghyun Jeong and Haeun Noh and Jaejun Yoo and Seungryul Baek and Jongwon Choi },
  journal={arXiv preprint arXiv:2507.02398},
  year={ 2025 }
}

Comments on this paper