76

Matrix Product Sketching via Coordinated Sampling

International Conference on Learning Representations (ICLR), 2025
Main:9 Pages
5 Figures
Bibliography:4 Pages
Appendix:5 Pages
Abstract

We revisit the well-studied problem of approximating a matrix product, ATB\mathbf{A}^T\mathbf{B}, based on small space sketches S(A)\mathcal{S}(\mathbf{A}) and S(B)\mathcal{S}(\mathbf{B}) of ARn×d\mathbf{A} \in \R^{n \times d} and BRn×m\mathbf{B}\in \R^{n \times m}. We are interested in the setting where the sketches must be computed independently of each other, except for the use of a shared random seed. We prove that, when A\mathbf{A} and B\mathbf{B} are sparse, methods based on \emph{coordinated random sampling} can outperform classical linear sketching approaches, like Johnson-Lindenstrauss Projection or CountSketch. For example, to obtain Frobenius norm error ϵAFBF\epsilon\|\mathbf{A}\|_F\|\mathbf{B}\|_F, coordinated sampling requires sketches of size O(s/ϵ2)O(s/\epsilon^2) when A\mathbf{A} and B\mathbf{B} have at most sd,ms \leq d,m non-zeros per row. In contrast, linear sketching leads to sketches of size O(d/ϵ2)O(d/\epsilon^2) and O(m/ϵ2)O(m/\epsilon^2) for A\mathbf{A} and B\mathbf{B}. We empirically evaluate our approach on two applications: 1) distributed linear regression in databases, a problem motivated by tasks like dataset discovery and augmentation, and 2) approximating attention matrices in transformer-based language models. In both cases, our sampling algorithms yield an order of magnitude improvement over linear sketching.

View on arXiv
Comments on this paper