36
7

UAV-VLA: Vision-Language-Action System for Large Scale Aerial Mission Generation

Abstract

The UAV-VLA (Visual-Language-Action) system is a tool designed to facilitate communication with aerial robots. By integrating satellite imagery processing with the Visual Language Model (VLM) and the powerful capabilities of GPT, UAV-VLA enables users to generate general flight paths-and-action plans through simple text requests. This system leverages the rich contextual information provided by satellite images, allowing for enhanced decision-making and mission planning. The combination of visual analysis by VLM and natural language processing by GPT can provide the user with the path-and-action set, making aerial operations more efficient and accessible. The newly developed method showed the difference in the length of the created trajectory in 22% and the mean error in finding the objects of interest on a map in 34.22 m by Euclidean distance in the K-Nearest Neighbors (KNN) approach.

View on arXiv
@article{sautenkov2025_2501.05014,
  title={ UAV-VLA: Vision-Language-Action System for Large Scale Aerial Mission Generation },
  author={ Oleg Sautenkov and Yasheerah Yaqoot and Artem Lykov and Muhammad Ahsan Mustafa and Grik Tadevosyan and Aibek Akhmetkazy and Miguel Altamirano Cabrera and Mikhail Martynov and Sausar Karaf and Dzmitry Tsetserukou },
  journal={arXiv preprint arXiv:2501.05014},
  year={ 2025 }
}
Comments on this paper