146

RelayFormer: A Unified Local-Global Attention Framework for Scalable Image and Video Manipulation Localization

Main:9 Pages
9 Figures
Bibliography:4 Pages
12 Tables
Appendix:8 Pages
Abstract

Visual manipulation localization (VML) -- across both images and videos -- is a crucial task in digital forensics that involves identifying tampered regions in visual content. However, existing methods often lack cross-modal generalization and struggle to handle high-resolution or long-duration inputs efficiently.

View on arXiv
Comments on this paper