Mixture-Model-based Bounding Box Density Estimation for Object Detection

IEEE International Conference on Computer Vision (ICCV), 2019

28 November 2019

ArXiv (abs)PDF HTML Github (11★)

Abstract

In this paper, we reformulate the multi-object detection task as density estimation of bounding boxes. We propose a new object detection network, Mixture-Model-based Object Detector (MMOD), that performs multi-object detection through density estimation using a mixture model. MMOD captures this conditional distribution of bounding boxes for a given input image using a mixture model consisting of Gaussian and categorical distributions. In doing so, we also propose a new network structure and objective function for the MMOD. MMOD is not trained by assigning a ground truth bounding box to the specific locations of the network's output. Instead, the mixture components are automatically learned to represent the distribution of the bounding box through density estimation. In this way, MMOD is not only trained without ground truth assignment but also does not suffer from foreground-background imbalance problem, since background bounding boxes are stochastically sampled from the mixture model that estimates ground truth bounding box distribution. We applied MMOD to MS COCO and Pascal VOC datasets, and observed that MMOD outperforms other detection methods in terms of speed and performance trade-offs. Code will be available.

View on arXiv

Comments on this paper