ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2310.16667
8
48

CoDet: Co-Occurrence Guided Region-Word Alignment for Open-Vocabulary Object Detection

25 October 2023
Chuofan Ma
Yi-Xin Jiang
Xin Wen
Zehuan Yuan
Xiaojuan Qi
    ObjD
    VLM
ArXivPDFHTML
Abstract

Deriving reliable region-word alignment from image-text pairs is critical to learn object-level vision-language representations for open-vocabulary object detection. Existing methods typically rely on pre-trained or self-trained vision-language models for alignment, which are prone to limitations in localization accuracy or generalization capabilities. In this paper, we propose CoDet, a novel approach that overcomes the reliance on pre-aligned vision-language space by reformulating region-word alignment as a co-occurring object discovery problem. Intuitively, by grouping images that mention a shared concept in their captions, objects corresponding to the shared concept shall exhibit high co-occurrence among the group. CoDet then leverages visual similarities to discover the co-occurring objects and align them with the shared concept. Extensive experiments demonstrate that CoDet has superior performances and compelling scalability in open-vocabulary detection, e.g., by scaling up the visual backbone, CoDet achieves 37.0 APnovelm\text{AP}^m_{novel}APnovelm​ and 44.7 APallm\text{AP}^m_{all}APallm​ on OV-LVIS, surpassing the previous SoTA by 4.2 APnovelm\text{AP}^m_{novel}APnovelm​ and 9.8 APallm\text{AP}^m_{all}APallm​. Code is available at https://github.com/CVMI-Lab/CoDet.

View on arXiv
Comments on this paper