Multimodal and Multiview Deep Fusion for Autonomous Marine Navigation

Abstract
We propose a cross attention transformer based method for multimodal sensor fusion to build a birds eye view of a vessels surroundings supporting safer autonomous marine navigation. The model deeply fuses multiview RGB and long wave infrared images with sparse LiDAR point clouds. Training also integrates X band radar and electronic chart data to inform predictions. The resulting view provides a detailed reliable scene representation improving navigational accuracy and robustness. Real world sea trials confirm the methods effectiveness even in adverse weather and complex maritime settings.
View on arXiv@article{dagdilelis2025_2505.01615, title={ Multimodal and Multiview Deep Fusion for Autonomous Marine Navigation }, author={ Dimitrios Dagdilelis and Panagiotis Grigoriadis and Roberto Galeazzi }, journal={arXiv preprint arXiv:2505.01615}, year={ 2025 } }
Comments on this paper