760

Efficient piecewise training of deep structured models for semantic segmentation

Computer Vision and Pattern Recognition (CVPR), 2015
Guosheng Lin
Chunhua Shen
Abstract

Recent advances in semantic image segmentation have mostly been achieved by training deep convolutional neural networks (CNNs) for the task. We show how to improve semantic segmentation through the use of contextual information. Specifically, we explore `patch-patch' context and `patch-background' context with deep CNNs. For learning the patch-patch context between image regions, we formulate Conditional Random Fields (CRFs) with CNN-based pairwise potential functions to capture semantic correlations between neighboring patches. Efficient piecewise training of the proposed deep structured model is then applied to avoid repeated expensive CRF inference for back propagation. In order to capture the patch-background context, we show that a network design with traditional multi-scale image input and sliding pyramid pooling is effective for improving performance. Our experiment results set new state-of-the-art performance on a number of popular semantic segmentation datasets, including NYUDv2, PASCAL VOC 2012, PASCAL-Context, and SIFT-flow. Particularly, we achieve an intersection-over-union score of 77.877.8 on the challenging PASCAL VOC 2012 dataset.

View on arXiv
Comments on this paper