UPRE: Zero-Shot Domain Adaptation for Object Detection via Unified Prompt and Representation Enhancement

1 July 2025

Xiao Zhang

Fei Wei

Yong Wang

Wenda Zhao

Feiyi Li

Xiangxiang Chu

VLM

ArXiv (abs)PDF HTML

Main:8 Pages

9 Figures

Bibliography:3 Pages

14 Tables

Appendix:4 Pages

Abstract

Zero-shot domain adaptation (ZSDA) presents substantial challenges due to the lack of images in the target domain. Previous approaches leverage Vision-Language Models (VLMs) to tackle this challenge, exploiting their zero-shot learning capabilities. However, these methods primarily address domain distribution shifts and overlook the misalignment between the detection task and VLMs, which rely on manually crafted prompts. To overcome these limitations, we propose the unified prompt and representation enhancement (UPRE) framework, which jointly optimizes both textual prompts and visual representations. Specifically, our approach introduces a multi-view domain prompt that combines linguistic domain priors with detection-specific knowledge, and a visual representation enhancement module that produces domain style variations. Furthermore, we introduce multi-level enhancement strategies, including relative domain distance and positive-negative separation, which align multi-modal representations at the image level and capture diverse visual representations at the instance level, respectively. Extensive experiments conducted on nine benchmark datasets demonstrate the superior performance of our framework in ZSDA detection scenarios. Code is available atthis https URL.

View on arXiv

@article{zhang2025_2507.00721,
  title={ UPRE: Zero-Shot Domain Adaptation for Object Detection via Unified Prompt and Representation Enhancement },
  author={ Xiao Zhang and Fei Wei and Yong Wang and Wenda Zhao and Feiyi Li and Xiangxiang Chu },
  journal={arXiv preprint arXiv:2507.00721},
  year={ 2025 }
}

Comments on this paper