Knowledge Guided Disambiguation for Large-Scale Scene Classification with Multi-Resolution CNNs

IEEE Transactions on Image Processing (IEEE TIP), 2016

4 October 2016

Yu Qiao

ArXiv (abs)PDF HTML Github (139★)

Abstract

Thanks to the available large-scale scene datasets such as Places and Places2, Convolutional Neural Networks (CNNs) have made remarkable progress on the problem of scene recognition. However, scene categories are often defined according its functions and there exist large intra-class variations in a single scene category. Meanwhile, as the number of scene classes is increasing, some classes tend to overlap with others and label ambiguity is becoming a problem. This paper focuses on large-scale scene recognition and makes two major contributions to tackle these issues. First, we propose a multi-resolution CNN architecture to capture visual content and structure at different scales. Our proposed multi-resolution CNNs are composed of coarse resolution CNNs and fine resolution CNNs, whose performance is complementary to each other. Second, we design two knowledge guided disambiguation techniques to deal with the problem of label ambiguity. In the first scenario, we exploit the knowledge from confusion matrix at validation data to merge similar classes into a super category, while in the second scenario, we utilize the knowledge of extra networks to produce a soft label for each image. Both the information of super category and soft labels are exploited to train CNNs on the Places2 datasets. We conduct experiments on three large-scale image classification datasets (ImangeNet, Places, Places2) to demonstrate the effectiveness of our proposed approach. In addition, our method takes part in two major scene recognition challenges, and we achieve the 2 $^{nd}$ place at the Places2 challenge 2015 and 1 $^{st}$ place at the LSUN challenge 2016. Finally, we transfer the learned representations to the datasets of MIT Indoor67 and SUN397, which yields the state-of-the-art performance (86.7% and 72.0%) on both datasets.

View on arXiv

Comments on this paper