202

Domain-Size Pooling in Local Descriptors: DSP-SIFT

Computer Vision and Pattern Recognition (CVPR), 2014
Abstract

We introduce a simple modification of local image descriptors, such as SIFT, that improves matching performance by 43.09% on the Oxford image matching benchmark and is implementable in few lines of code. To put it in perspective, this is more than half of the improvement that SIFT provides over raw image intensities on the same datasets. The trick consists of pooling gradient orientations across different domain sizes, in addition to spatial locations, and yields a descriptor of the same dimension of the original, which we call DSP-SIFT. Domain-size pooling causes DSP-SIFT to outperform by 28.29% a Convolutional Neural Network, which in turn has been recently reported to outperform ordinary SIFT by 11.54%. This is despite the network being trained on millions of images and outputting a descriptor of considerably larger size. Domain-size pooling is counter-intuitive and contrary to the practice of scale selection as taught in scale-space theory, but has solid roots in classical sampling theory.

View on arXiv
Comments on this paper