Polar Transformer Networks
- ViT
Convolutional neural networks (CNNs) are equivariant with respect to translation; a translation in the input causes a translation in the output. Attempts to generalize equivariance have concentrated on rotations. In this paper, we combine the idea of the spatial transformer, and the canonical coordinate representations of groups (polar transform) to realize a network that is invariant to translation, and equivariant to rotation and scale. A conventional CNN is used to predict the origin of a polar transform. The polar transform is performed in a differentiable way, similar to the Spatial Transformer Networks, and the resulting polar representation is fed into a second CNN. The model is trained end-to-end with a classification loss. We apply the method on variations of MNIST, obtained by perturbing it with clutter, translation, rotation, and scaling. We achieve state of the art performance in the rotated MNIST, with fewer parameters and faster training time than previous methods, and we outperform all tested methods in the SIM2MNIST dataset, which we introduce.
View on arXiv