ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2502.18041
31
3

OpenFly: A Versatile Toolchain and Large-scale Benchmark for Aerial Vision-Language Navigation

25 February 2025
Yunpeng Gao
C. Li
Zhongrui You
J. Liu
Zhen Li
Pengan Chen
Qizhi Chen
Zhonghan Tang
Liansheng Wang
Penghui Yang
Yiwen Tang
Yuhang Tang
Shuai Liang
Songyi Zhu
Ziqin Xiong
Yifei Su
Xinyi Ye
Jianan Li
Yan Ding
Dong Wang
Z. Wang
Bin Zhao
X. Li
ArXivPDFHTML
Abstract

Vision-Language Navigation (VLN) aims to guide agents through an environment by leveraging both language instructions and visual cues, playing a pivotal role in embodied AI. Indoor VLN has been extensively studied, whereas outdoor aerial VLN remains underexplored. The potential reason is that outdoor aerial view encompasses vast areas, making data collection more challenging, which results in a lack of benchmarks. To address this problem, we propose OpenFly, a platform comprising a versatile toolchain and large-scale benchmark for aerial VLN. Firstly, we develop a highly automated toolchain for data collection, enabling automatic point cloud acquisition, scene semantic segmentation, flight trajectory creation, and instruction generation. Secondly, based on the toolchain, we construct a large-scale aerial VLN dataset with 100k trajectories, covering diverse heights and lengths across 18 scenes. The corresponding visual data are generated using various rendering engines and advanced techniques, including Unreal Engine, GTA V, Google Earth, and 3D Gaussian Splatting (3D GS). All data exhibit high visual quality. Particularly, 3D GS supports real-to-sim rendering, further enhancing the realism of the dataset. Thirdly, we propose OpenFly-Agent, a keyframe-aware VLN model, which takes language instructions, current observations, and historical keyframes as input, and outputs flight actions directly. Extensive analyses and experiments are conducted, showcasing the superiority of our OpenFly platform and OpenFly-Agent. The toolchain, dataset, and codes will be open-sourced.

View on arXiv
@article{gao2025_2502.18041,
  title={ OpenFly: A Versatile Toolchain and Large-scale Benchmark for Aerial Vision-Language Navigation },
  author={ Yunpeng Gao and Chenhui Li and Zhongrui You and Junli Liu and Zhen Li and Pengan Chen and Qizhi Chen and Zhonghan Tang and Liansheng Wang and Penghui Yang and Yiwen Tang and Yuhang Tang and Shuai Liang and Songyi Zhu and Ziqin Xiong and Yifei Su and Xinyi Ye and Jianan Li and Yan Ding and Dong Wang and Zhigang Wang and Bin Zhao and Xuelong Li },
  journal={arXiv preprint arXiv:2502.18041},
  year={ 2025 }
}
Comments on this paper