The role of spatio-temporal synchrony in the encoding of motion
We consider the task of inferring motion from an image sequence. We show that the detection of a spatial transformation can be viewed as the detection of synchrony between the image sequence and a sequence of features undergoing that transformation. The classic motion energy model can be derived from this view by introducing phase invariance via pooling. The view from synchrony therefore allows us to disentangle the contributions of invariance and of motion estimation in the energy model. It also makes it possible to derive local learning rules for learning motion representations unsupervised from data. We show that a model based on local learning can achieve competitive performance in a wide variety of motion understanding tasks and works much better than hand-crafted spatio-temporal features.
View on arXiv