Unsupervised Learning of Spatiotemporally Coherent Metrics

IEEE International Conference on Computer Vision (ICCV), 2014

18 December 2014

Joan Bruna

Abstract

Current state-of-the-art object detection and recognition algorithms rely on supervised training, and most benchmark datasets contain only static images. In this work we study feature learning in the context of temporally coherent video data. We focus on training convolutional features on unlabeled video data, using only the assumption that adjacent video frames contain semantically similar information. This assumption is exploited to train a convolutional pooling auto-encoder regularized by slowness and sparsity. We show that nearest neighbors of a query frame in the learned feature space are more likely to be its temporal neighbors.

View on arXiv

Comments on this paper