37
0

Normalized Matching Transformer

Abstract

We present a new state of the art approach for sparse keypoint matching between pairs of images. Our method consists of a fully deep learning based approach combining a visual backbone coupled with a SplineCNN graph neural network for feature processing and a normalized transformer decoder for decoding keypoint correspondences together with the Sinkhorn algorithm. Our method is trained using a contrastive and a hyperspherical loss for better feature representations. We additionally use data augmentation during training. This comparatively simple architecture combining extensive normalization and advanced losses outperforms current state of the art approaches on PascalVOC and SPair-71k datasets by 5.1%5.1\% and 2.2%2.2\% respectively compared to BBGM, ASAR, COMMON and GMTR while training for at least 1.7x1.7x fewer epochs.

View on arXiv
@article{pourhadi2025_2503.17715,
  title={ Normalized Matching Transformer },
  author={ Abtin Pourhadi and Paul Swoboda },
  journal={arXiv preprint arXiv:2503.17715},
  year={ 2025 }
}
Comments on this paper