Scale-Invariant Feature Learning using Deconvolutional Neural Networks for Weakly-Supervised Semantic Segmentation
- SSeg

A weakly-supervised semantic segmentation framework using tied deconvolutional neural networks is proposed for scale-invariant feature learning. Each deconvolution layer in the proposed framework consists of unpooling and deconvolution operations. 'Unpooling' upsamples the input feature map based on unpooling switches defined by corresponding convolution layer's pooling operation. 'Deconvolution' convolves the input unpooled features by using convolutional weights tied with the corresponding convolution layer's convolution operation. This unpooling-deconvolution combination results in reduction of false positives, since output features of the deconvolution layer are reconstructed from the most discriminative unpooled features instead of the raw one. All the feature maps restored from the entire deconvolution layers can constitute a rich feature set according to different abstraction levels. Those features are selectively used for generating class-specific activation maps. Under the weak supervision (image-level labels), the proposed framework shows promising results on medical images (chest X-rays) and achieves state-of-the-art performance on the PASCAL VOC segmentation dataset in the same experimental condition.
View on arXiv