The success of sparse representations in image modeling has motivated its use in computer vision applications. In complex visual recognition tasks it is typical to adopt multiple descriptors, that describe different aspects of the data, for obtaining improved recognition performance. Descriptors that have diverse forms can be fused into a unified feature space in a principled manner using kernel methods. Learning sparse models in the resulting space will provide highly discriminative sparse codes for object recognition and unsupervised clustering. To this end, we develop the paradigm of multiple kernel sparse coding and propose two different approaches to optimize dictionaries for the feature space representations. The first approach works by building a separate dictionary for each descriptor set in its own feature space and then optimizes them for efficient representation in the unified feature space. Whereas, the second approach learns dictionaries in the unified feature space directly using the ensemble kernel matrices and hence provides a greater flexibility in the choice of kernel functions. Finally, we evaluate the utility of multiple kernel sparse codes obtained with the proposed approaches in object recognition and clustering applications. We demonstrate that improvements in performance are obtained by fusing multiple descriptors, when compared to using each descriptor individually.
View on arXiv