23
0

VIDSTAMP: A Temporally-Aware Watermark for Ownership and Integrity in Video Diffusion Models

Abstract

The rapid rise of video diffusion models has enabled the generation of highly realistic and temporally coherent videos, raising critical concerns about content authenticity, provenance, and misuse. Existing watermarking approaches, whether passive, post-hoc, or adapted from image-based techniques, often struggle to withstand video-specific manipulations such as frame insertion, dropping, or reordering, and typically degrade visual quality. In this work, we introduce VIDSTAMP, a watermarking framework that embeds per-frame or per-segment messages directly into the latent space of temporally-aware video diffusion models. By fine-tuning the model's decoder through a two-stage pipeline, first on static image datasets to promote spatial message separation, and then on synthesized video sequences to restore temporal consistency, VIDSTAMP learns to embed high-capacity, flexible watermarks with minimal perceptual impact. Leveraging architectural components such as 3D convolutions and temporal attention, our method imposes no additional inference cost and offers better perceptual quality than prior methods, while maintaining comparable robustness against common distortions and tampering. VIDSTAMP embeds 768 bits per video (48 bits per frame) with a bit accuracy of 95.0%, achieves a log P-value of -166.65 (lower is better), and maintains a video quality score of 0.836, comparable to unwatermarked outputs (0.838) and surpassing prior methods in capacity-quality tradeoffs. Code: Code: \url{this https URL}

View on arXiv
@article{teymoorianfard2025_2505.01406,
  title={ VIDSTAMP: A Temporally-Aware Watermark for Ownership and Integrity in Video Diffusion Models },
  author={ Mohammadreza Teymoorianfard and Shiqing Ma and Amir Houmansadr },
  journal={arXiv preprint arXiv:2505.01406},
  year={ 2025 }
}
Comments on this paper