Leveraging BEV Paradigm for Ground-to-Aerial Image Synthesis

3 August 2024

Junyan Ye

Yi Lin

Conghui He

Abstract

Ground-to-aerial image synthesis focuses on generating realistic aerial images from corresponding ground street view images while maintaining consistent content layout, simulating a top-down view. The significant viewpoint difference leads to domain gaps between views, and dense urban scenes limit the visible range of street views, making this cross-view generation task particularly challenging. In this paper, we introduce SkyDiffusion, a novel cross-view generation method for synthesizing aerial images from street view images, utilizing a diffusion model and the Bird's-Eye View (BEV) paradigm. The Curved-BEV method in SkyDiffusion converts street-view images into a BEV perspective, effectively bridging the domain gap, and employs a "multi-to-one" mapping strategy to address occlusion issues in dense urban scenes. Next, SkyDiffusion designed a BEV-guided diffusion model to generate content-consistent and realistic aerial images. Additionally, we introduce a novel dataset, Ground2Aerial-3, designed for diverse ground-to-aerial image synthesis applications, including disaster scene aerial synthesis, low-altitude UAV image synthesis, and historical high-resolution satellite image synthesis tasks. Experimental results demonstrate that SkyDiffusion outperforms state-of-the-art methods on cross-view datasets across natural (CVUSA), suburban (CVACT), urban (VIGOR-Chicago), and various application scenarios (G2A-3), achieving realistic and content-consistent aerial image generation. The code, datasets and more information of this work can be found atthis https URL.

View on arXiv

@article{ye2025_2408.01812,
  title={ Leveraging BEV Paradigm for Ground-to-Aerial Image Synthesis },
  author={ Junyan Ye and Jun He and Weijia Li and Zhutao Lv and Yi Lin and Jinhua Yu and Haote Yang and Conghui He },
  journal={arXiv preprint arXiv:2408.01812},
  year={ 2025 }
}

Comments on this paper