Invisible Servoing: a Visual Servoing Approach with Return-Conditioned Latent Diffusion

In this paper, we present a novel visual servoing (VS) approach based on latent Denoising Diffusion Probabilistic Models (DDPMs), that explores the application of generative models for vision-based navigation of UAVs (Uncrewed Aerial Vehicles). Opposite to classical VS methods, the proposed approach allows reaching the desired target view, even when the target is initially not visible. This is possible thanks to the learning of a latent representation that the DDPM uses for planning and a dataset of trajectories encompassing target-invisible initial views. A compact representation is learned from raw images using a Cross-Modal Variational Autoencoder. Given the current image, the DDPM generates trajectories in the latent space driving the robotic platform to the desired visual target. The approach has been validated in simulation using two generic multi-rotor UAVs (a quadrotor and a hexarotor). The results show that we can successfully reach the visual target, even if not visible in the initial view.
View on arXiv@article{gerges2025_2409.13337, title={ Invisible Servoing: a Visual Servoing Approach with Return-Conditioned Latent Diffusion }, author={ Bishoy Gerges and Barbara Bazzana and Nicolò Botteghi and Youssef Aboudorra and Antonio Franchi }, journal={arXiv preprint arXiv:2409.13337}, year={ 2025 } }