RelayFormer: A Unified Local-Global Attention Framework for Scalable Image and Video Manipulation Localization

13 August 2025

Wen Huang

ArXiv (abs)PDF HTML Github (8★)

Main:9 Pages

9 Figures

Bibliography:4 Pages

12 Tables

Appendix:8 Pages

Abstract

Visual manipulation localization (VML) -- across both images and videos -- is a crucial task in digital forensics that involves identifying tampered regions in visual content. However, existing methods often lack cross-modal generalization and struggle to handle high-resolution or long-duration inputs efficiently.

View on arXiv

Comments on this paper