Efficient distributed representations with linear-time attention scores normalization

30 March 2023

Lorenzo DallÁmico

Enrico Belliardo

ArXiv (abs)PDF HTML

Main:10 Pages

3 Figures

Bibliography:4 Pages

Appendix:5 Pages

Abstract

The attention score matrix ${\rm SoftMax}(XY^T)$ encodes relational similarity patterns between objects and is extremely popular in machine learning. However, the complexity required to calculate it runs quadratically with the problem size, making it a computationally heavy solution. In this article, we propose a linear-time approximation of the attention score normalization constants for embedding vectors with bounded norms. We show on several pre-trained embeddings that the accuracy of our estimation formula surpasses competing kernel methods by even orders of magnitude. From this result, we design a linear-time and task-agnostic embedding algorithm based on the optimization of the attention scores. The proposed algorithm is highly interpretable and easily adapted to an arbitrary embedding problem. We consider a few use-cases and observe similar or higher performances and a lower computational time with respect to comparable embedding algorithms.

View on arXiv

Comments on this paper