Monocular Object Instance Segmentation and Depth Ordering with CNNs
In this paper we tackle the problem of instance level segmentation and depth ordering from a single monocular image. Towards this goal, we take advantage of convolutional neural nets and train them to directly predict instance level segmentations where the instance ID encodes depth ordering from large image patches. Importantly, to train the deep network we only employ weak labels in the form of 3D bounding boxes. To provide a coherent single explanation of an image we develop a Markov random field which takes as input the predictions of convolutional nets applied at overlapping patches of different resolutions as well as the output of a connected component algorithm and predicts very accurate instance level segmentation and depth ordering. We demonstrate the effectiveness of our approach on the challenging KITTI benchmark and show very good performance on both tasks.
View on arXiv