69

Improved Algorithms for Kernel Matrix-Vector Multiplication Under Sparsity Assumptions

International Conference on Learning Representations (ICLR), 2025
Main:9 Pages
5 Figures
Bibliography:2 Pages
Appendix:5 Pages
Abstract

Motivated by the problem of fast processing of attention matrices, we study fast algorithms for computing matrix-vector products for asymmetric Gaussian Kernel matrices KRn×nK\in \mathbb{R}^{n\times n}. KK's columns are indexed by a set of nn keys k1,k2,knRdk_1,k_2\ldots, k_n\in \mathbb{R}^d, rows by a set of nn queries q1,q2,,qnRdq_1,q_2,\ldots,q_n\in \mathbb{R}^d , and its i,ji,j entry is Kij=eqikj22/2σ2K_{ij} = e^{-\|q_i-k_j\|_2^2/2\sigma^2} for some bandwidth parameter σ>0\sigma>0. Given a vector xRnx\in \mathbb{R}^n and error parameter ϵ>0\epsilon>0, our task is to output a yRny\in \mathbb{R}^n such that Kxy2ϵx2\|Kx-y\|_2\leq \epsilon \|x\|_2 in time subquadratic in nn and linear in dd. Our algorithms rely on the following modelling assumption about the matrices KK: the sum of the entries of KK scales linearly in nn, as opposed to worst case quadratic growth. We validate this assumption experimentally, for Gaussian kernel matrices encountered in various settings such as fast attention computation in LLMs. We obtain the first subquadratic-time algorithm that works under this assumption, for unrestricted vectors.

View on arXiv
Comments on this paper