353

RATM: Recurrent Attentive Tracking Model

Abstract

This work presents an attention based approach to tracking objects in video. A recurrent neural network is trained to predict the position of an object in the video at time t+1 given a series of selective glimpses at times 1 to t. Glimpses are selected based on a differentiable (soft-)attention mechanism, which makes it possible to train the model end-to-end using standard stochastic gradient descent. Experiments on artificial data-sets show the importance of various design choices and show that the model is able to perform simple tracking tasks.

View on arXiv
Comments on this paper