Prompt Disentanglement via Language Guidance and Representation Alignment for Domain Generalization

3 July 2025

De Cheng

Zhipeng Xu

Xinyang Jiang

Dongsheng Li

Nannan Wang

Xinbo Gao

ArXiv (abs)PDF HTML

Main:14 Pages

15 Figures

Bibliography:3 Pages

Abstract

Domain Generalization (DG) seeks to develop a versatile model capable of performing effectively on unseen target domains. Notably, recent advances in pre-trained Visual Foundation Models (VFMs), such as CLIP, have demonstrated considerable potential in enhancing the generalization capabilities of deep learning models. Despite the increasing attention toward VFM-based domain prompt tuning within DG, the effective design of prompts capable of disentangling invariant features across diverse domains remains a critical challenge. In this paper, we propose addressing this challenge by leveraging the controllable and flexible language prompt of the VFM. Noting that the text modality of VFMs is naturally easier to disentangle, we introduce a novel framework for text feature-guided visual prompt tuning. This framework first automatically disentangles the text prompt using a large language model (LLM) and then learns domain-invariant visual representation guided by the disentangled text feature. However, relying solely on language to guide visual feature disentanglement has limitations, as visual features can sometimes be too complex or nuanced to be fully captured by descriptive text. To address this, we introduce Worst Explicit Representation Alignment (WERA), which extends text-guided visual prompts by incorporating an additional set of abstract prompts. These prompts enhance source domain diversity through stylized image augmentations, while alignment constraints ensure that visual representations remain consistent across both the original and augmented distributions. Experiments conducted on major DG datasets, including PACS, VLCS, OfficeHome, DomainNet, and TerraInc, demonstrate that our proposed method outperforms state-of-the-art DG methods.

View on arXiv

@article{cheng2025_2507.02288,
  title={ Prompt Disentanglement via Language Guidance and Representation Alignment for Domain Generalization },
  author={ De Cheng and Zhipeng Xu and Xinyang Jiang and Dongsheng Li and Nannan Wang and Xinbo Gao },
  journal={arXiv preprint arXiv:2507.02288},
  year={ 2025 }
}

Comments on this paper