29
0

LLaVA-ReID: Selective Multi-image Questioner for Interactive Person Re-Identification

Abstract

Traditional text-based person ReID assumes that person descriptions from witnesses are complete and provided at once. However, in real-world scenarios, such descriptions are often partial or vague. To address this limitation, we introduce a new task called interactive person re-identification (Inter-ReID). Inter-ReID is a dialogue-based retrieval task that iteratively refines initial descriptions through ongoing interactions with the witnesses. To facilitate the study of this new task, we construct a dialogue dataset that incorporates multiple types of questions by decomposing fine-grained attributes of individuals. We further propose LLaVA-ReID, a question model that generates targeted questions based on visual and textual contexts to elicit additional details about the target person. Leveraging a looking-forward strategy, we prioritize the most informative questions as supervision during training. Experimental results on both Inter-ReID and text-based ReID benchmarks demonstrate that LLaVA-ReID significantly outperforms baselines.

View on arXiv
@article{lu2025_2504.10174,
  title={ LLaVA-ReID: Selective Multi-image Questioner for Interactive Person Re-Identification },
  author={ Yiding Lu and Mouxing Yang and Dezhong Peng and Peng Hu and Yijie Lin and Xi Peng },
  journal={arXiv preprint arXiv:2504.10174},
  year={ 2025 }
}
Comments on this paper