Is Intermediate Fusion All You Need for UAV-based Collaborative Perception?

Collaborative perception enhances environmental awareness through inter-agent communication and is regarded as a promising solution to intelligent transportation systems. However, existing collaborative methods for Unmanned Aerial Vehicles (UAVs) overlook the unique characteristics of the UAV perspective, resulting in substantial communication overhead. To address this issue, we propose a novel communication-efficient collaborative perception framework based on late-intermediate fusion, dubbed LIF. The core concept is to exchange informative and compact detection results and shift the fusion stage to the feature representation level. In particular, we leverage vision-guided positional embedding (VPE) and box-based virtual augmented feature (BoBEV) to effectively integrate complementary information from various agents. Additionally, we innovatively introduce an uncertainty-driven communication mechanism that uses uncertainty evaluation to select high-quality and reliable shared areas. Experimental results demonstrate that our LIF achieves superior performance with minimal communication bandwidth, proving its effectiveness and practicality. Code and models are available atthis https URL.
View on arXiv@article{hao2025_2504.21774, title={ Is Intermediate Fusion All You Need for UAV-based Collaborative Perception? }, author={ Jiuwu Hao and Liguo Sun and Yuting Wan and Yueyang Wu and Ti Xiang and Haolin Song and Pin Lv }, journal={arXiv preprint arXiv:2504.21774}, year={ 2025 } }