Material Recognition from Local Appearance in Global Context
Recognition of materials has proven to be a challenging problem due to the wide variation in appearance within and between categories. Many recent material recognition methods treat materials as yet another set of labels like objects. Materials are, however, fundamentally different from objects as they have no inherent shape or defined spatial extent. This makes local material recognition particularly hard. Global image context, such as where the material is or what object it makes up, can be crucial to recognizing the material. Existing methods, however, operate on an implicit fusion of materials and context by using large receptive fields as input (i.e., large image patches). Such an approach can only take advantage of limited context as it appears during training, and will be bounded by the combinations seen in the training data. We instead show that recognizing materials purely from their local appearance and integrating separately recognized global contextual cues including objects and places leads to superior dense, per-pixel, material recognition. We achieve this by training a fully-convolutional material recognition network end-to-end with only material category supervision. We integrate object and place estimates to this network from independent CNNs. This approach avoids the necessity of preparing an infeasible amount of training data that covers the product space of materials, objects, and scenes, while fully leveraging contextual cues for dense material recognition. Experimental results validate the effectiveness of our approach and show that our method outperforms past methods that build on inseparable material and contextual information.
View on arXiv