A Linear Approximation to the chi^2 Kernel with Geometric Convergence
In this paper we present an inference procedure for the semantic segmentation of images. Different from many CRF approaches that rely on dependencies modeled with unary and pairwise pixel or superpixel potentials, our method is entirely based on estimates of the overlap between each of a set of mid-level object segmentation proposals with large spatial support and the objects present in the image. Continuous latent variables that model the overlap between each object segmentation proposal and each ground truth object region are defined at the level of superpixels resulting from segment intersections. Inference for the optimal layout, involving segment \emph{refinement} and \emph{recombination}, as well as \emph{handling multiple interacting objects, even from the same class, in one image}, is jointly performed by maximizing the composite likelihood of the underlying model using an EM algorithm. In the PASCAL VOC segmentation challenge, the proposed approach obtains top accuracy and successfully handles images showing complex object interactions.
View on arXiv