ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2504.09195
19
0

ReferGPT: Towards Zero-Shot Referring Multi-Object Tracking

12 April 2025
Tzoulio Chamiti
Leandro Di Bella
Adrian Munteanu
Nikos Deligiannis
ArXivPDFHTML
Abstract

Tracking multiple objects based on textual queries is a challenging task that requires linking language understanding with object association across frames. Previous works typically train the whole process end-to-end or integrate an additional referring text module into a multi-object tracker, but they both require supervised training and potentially struggle with generalization to open-set queries. In this work, we introduce ReferGPT, a novel zero-shot referring multi-object tracking framework. We provide a multi-modal large language model (MLLM) with spatial knowledge enabling it to generate 3D-aware captions. This enhances its descriptive capabilities and supports a more flexible referring vocabulary without training. We also propose a robust query-matching strategy, leveraging CLIP-based semantic encoding and fuzzy matching to associate MLLM generated captions with user queries. Extensive experiments on Refer-KITTI, Refer-KITTIv2 and Refer-KITTI+ demonstrate that ReferGPT achieves competitive performance against trained methods, showcasing its robustness and zero-shot capabilities in autonomous driving. The codes are available onthis https URL

View on arXiv
@article{chamiti2025_2504.09195,
  title={ ReferGPT: Towards Zero-Shot Referring Multi-Object Tracking },
  author={ Tzoulio Chamiti and Leandro Di Bella and Adrian Munteanu and Nikos Deligiannis },
  journal={arXiv preprint arXiv:2504.09195},
  year={ 2025 }
}
Comments on this paper