10
0

UPRE: Zero-Shot Domain Adaptation for Object Detection via Unified Prompt and Representation Enhancement

Xiao Zhang
Fei Wei
Yong Wang
Wenda Zhao
Feiyi Li
Xiangxiang Chu
Main:8 Pages
9 Figures
Bibliography:3 Pages
14 Tables
Appendix:4 Pages
Abstract

Zero-shot domain adaptation (ZSDA) presents substantial challenges due to the lack of images in the target domain. Previous approaches leverage Vision-Language Models (VLMs) to tackle this challenge, exploiting their zero-shot learning capabilities. However, these methods primarily address domain distribution shifts and overlook the misalignment between the detection task and VLMs, which rely on manually crafted prompts. To overcome these limitations, we propose the unified prompt and representation enhancement (UPRE) framework, which jointly optimizes both textual prompts and visual representations. Specifically, our approach introduces a multi-view domain prompt that combines linguistic domain priors with detection-specific knowledge, and a visual representation enhancement module that produces domain style variations. Furthermore, we introduce multi-level enhancement strategies, including relative domain distance and positive-negative separation, which align multi-modal representations at the image level and capture diverse visual representations at the instance level, respectively. Extensive experiments conducted on nine benchmark datasets demonstrate the superior performance of our framework in ZSDA detection scenarios. Code is available atthis https URL.

View on arXiv
@article{zhang2025_2507.00721,
  title={ UPRE: Zero-Shot Domain Adaptation for Object Detection via Unified Prompt and Representation Enhancement },
  author={ Xiao Zhang and Fei Wei and Yong Wang and Wenda Zhao and Feiyi Li and Xiangxiang Chu },
  journal={arXiv preprint arXiv:2507.00721},
  year={ 2025 }
}
Comments on this paper