439
v1v2 (latest)

RemoteDet-Mamba: A Hybrid Mamba-CNN Network for Multi-modal Object Detection in Remote Sensing Images

Main:4 Pages
4 Figures
Bibliography:1 Pages
2 Tables
Abstract

Unmanned Aerial Vehicle (UAV) remote sensing, with its advantages of rapid information acquisition and low cost, has been widely applied in scenarios such as emergency response. However, due to the long imaging distance and complex imaging mechanisms, targets in remote sensing images often face challenges such as small object size, dense distribution, and low inter-class discriminability. To address these issues, this paper proposes a multi-modal remote sensing object detection network called RemoteDet-Mamba, which is based on a patch-level four-direction selective scanning fusion strategy. This method simultaneously learns unimodal local features and fuses cross-modal patch-level global semantic information, thereby enhancing the distinguishability of small objects and improving inter-class discrimination. Furthermore, the designed lightweight fusion mechanism effectively decouples densely packed targets while reducing computational complexity. Experimental results on the DroneVehicle dataset demonstrate that RemoteDet-Mamba achieves superior detection performance compared to current mainstream methods, while maintaining low parameter count and computational overhead, showing promising potential for practical applications.

View on arXiv
Comments on this paper