Unsupervised Learning of Spatiotemporally Coherent Metrics
IEEE International Conference on Computer Vision (ICCV), 2014
- DRL
Abstract
Current state-of-the-art object detection and recognition algorithms rely on supervised training, and most benchmark datasets contain only static images. In this work we study feature learning in the context of temporally coherent video data. We focus on training convolutional features on unlabeled video data, using only the assumption that adjacent video frames contain semantically similar information. This assumption is exploited to train a convolutional pooling auto-encoder regularized by slowness and sparsity. We show that nearest neighbors of a query frame in the learned feature space are more likely to be its temporal neighbors.
View on arXivComments on this paper
