Efficient Adaptation For Remote Sensing Visual Grounding

29 March 2025

Abstract

Adapting pre-trained models has become an effective strategy in artificial intelligence, offering a scalable and efficient alternative to training models from scratch. In the context of remote sensing (RS), where visual grounding(VG) remains underexplored, this approach enables the deployment of powerful vision-language models to achieve robust cross-modal understanding while significantly reducing computational overhead. To address this, we applied Parameter Efficient Fine Tuning (PEFT) techniques to adapt these models for RS-specific VG tasks. Specifically, we evaluated LoRA placement across different modules in Grounding DINO and used BitFit and adapters to fine-tune the OFA foundation model pre-trained on general-purpose VG datasets. This approach achieved performance comparable to or surpassing current State Of The Art (SOTA) models while significantly reducing computational costs. This study highlights the potential of PEFT techniques to advance efficient and precise multi-modal analysis in RS, offering a practical and cost-effective alternative to full model training.

View on arXiv

@article{moughnieh2025_2503.23083,
  title={ Efficient Adaptation For Remote Sensing Visual Grounding },
  author={ Hasan Moughnieh and Mohamad Chalhoub and Hasan Nasrallah and Cristiano Nattero and Paolo Campanella and Giovanni Nico and Ali J. Ghandour },
  journal={arXiv preprint arXiv:2503.23083},
  year={ 2025 }
}

Comments on this paper