76
0

DitHub: A Modular Framework for Incremental Open-Vocabulary Object Detection

Abstract

Open-Vocabulary object detectors can recognize a wide range of categories using simple textual prompts. However, improving their ability to detect rare classes or specialize in certain domains remains a challenge. While most recent methods rely on a single set of model weights for adaptation, we take a different approach by using modular deep learning. We introduce DitHub, a framework designed to create and manage a library of efficient adaptation modules. Inspired by Version Control Systems, DitHub organizes expert modules like branches that can be fetched and merged as needed. This modular approach enables a detailed study of how adaptation modules combine, making it the first method to explore this aspect in Object Detection. Our approach achieves state-of-the-art performance on the ODinW-13 benchmark and ODinW-O, a newly introduced benchmark designed to evaluate how well models adapt when previously seen classes reappear. For more details, visit our project page:this https URL

View on arXiv
@article{cappellino2025_2503.09271,
  title={ DitHub: A Modular Framework for Incremental Open-Vocabulary Object Detection },
  author={ Chiara Cappellino and Gianluca Mancusi and Matteo Mosconi and Angelo Porrello and Simone Calderara and Rita Cucchiara },
  journal={arXiv preprint arXiv:2503.09271},
  year={ 2025 }
}
Comments on this paper