404
v1v2v3 (latest)

FiGO: Fine-Grained Object Counting without Annotations

Main:9 Pages
7 Figures
Bibliography:3 Pages
7 Tables
Abstract

Class-agnostic counting (CAC) methods reduce annotation costs by letting users define what to count at test-time through text or visual exemplars. However, current open-vocabulary approaches work well for broad categories but fail when fine-grained category distinctions are needed, such as telling apart waterfowl species or pepper cultivars. We present FiGO, a new annotation-free method that adapts existing counting models to fine-grained categories using only the category name. Our approach uses a text-to-image diffusion model to create synthetic examples and a joint positive/hard-negative loss to learn a compact concept embedding that conditions a specialization module to convert outputs from any frozen counter into accurate, fine-grained estimates. To evaluate fine-grained counting, we introduce LOOKALIKES, a dataset of 37 subcategories across 14 parent categories with many visually similar objects per image. Our method substantially outperforms strong open-vocabulary baselines, moving counting systems from "count all the peppers" to "count only the habaneros."

View on arXiv
Comments on this paper