Emergency search and rescue (SAR) operations often require rapid and precise target identification in complex environments where traditional manual drone control is inefficient. In order to address these scenarios, a rapid SAR system, UAV-VLRR (Vision-Language-Rapid-Response), is developed in this research. This system consists of two aspects: 1) A multimodal system which harnesses the power of Visual Language Model (VLM) and the natural language processing capabilities of ChatGPT-4o (LLM) for scene interpretation. 2) A non-linearmodel predictive control (NMPC) with built-in obstacle avoidance for rapid response by a drone to fly according to the output of the multimodal system. This work aims at improving response times in emergency SAR operations by providing a more intuitive and natural approach to the operator to plan the SAR mission while allowing the drone to carry out that mission in a rapid and safe manner. When tested, our approach was faster on an average by 33.75% when compared with an off-the-shelf autopilot and 54.6% when compared with a human pilot. Video of UAV-VLRR:this https URL
View on arXiv@article{yaqoot2025_2503.02465, title={ UAV-VLRR: Vision-Language Informed NMPC for Rapid Response in UAV Search and Rescue }, author={ Yasheerah Yaqoot and Muhammad Ahsan Mustafa and Oleg Sautenkov and Artem Lykov and Valerii Serpiva and Dzmitry Tsetserukou }, journal={arXiv preprint arXiv:2503.02465}, year={ 2025 } }