220

Fine-grained Visual Categorization using PAIRS: Pose and Appearance Integration for Recognizing Subcategories

Abstract

Fine-grained Visual Categorization (FGVC) saw a tremendous boost between 2013 and 2016 with the incorporation of deep learning, however, progress has recently begun to slow. In this work, we postulate that one key to continued advances in fine-grained recognition performance is a better, and specifically, a more explicit understanding of pose and appearance. We propose a model that predicts an object's pose and then describes its appearance relative to the estimated pose. Our representation leveraging pose-aligned appearance patches was evaluated on and achieves state-of-the art performance for two key fine grained datasets, CUB-200 and NABirds, most notably raising the standard for the widely-used CUB-200 dataset by nearly 2% to 89.2%.

View on arXiv
Comments on this paper