Exploring Part-Informed Visual-Language Learning for Person Re-Identification

4 August 2023

Bing Yin

Abstract

Recently, visual-language learning (VLL) has shown great potential in enhancing visual-based person re-identification (ReID). Existing VLL-based ReID methods typically focus on image-text feature alignment at the whole-body level, while neglecting supervision on fine-grained part features, thus lacking constraints for local feature semantic consistency. To this end, we propose Part-Informed Visual-language Learning ( $\pi$ -VL) to enhance fine-grained visual features with part-informed language supervisions for ReID tasks. Specifically, $\pi$ -VL introduces a human parsing-guided prompt tuning strategy and a hierarchical visual-language alignment paradigm to ensure within-part feature semantic consistency. The former combines both identity labels and human parsing maps to constitute pixel-level text prompts, and the latter fuses multi-scale visual features with a light-weight auxiliary head to perform fine-grained image-text alignment. As a plug-and-play and inference-free solution, our $\pi$ -VL achieves performance comparable to or better than state-of-the-art methods on four commonly used ReID benchmarks. Notably, it reports 91.0% Rank-1 and 76.9% mAP on the challenging MSMT17 database, without bells and whistles.

View on arXiv

@article{lin2025_2308.02738,
  title={ Exploring Part-Informed Visual-Language Learning for Person Re-Identification },
  author={ Yin Lin and Yehansen Chen and Baocai Yin and Jinshui Hu and Bing Yin and Cong Liu and Zengfu Wang },
  journal={arXiv preprint arXiv:2308.02738},
  year={ 2025 }
}

Comments on this paper