Efficient -bit tensor approximations
We present a spatially efficient decomposition of matrices and arbitrary-order tensors as linear combinations of tensor products of -valued vectors. For any matrix , is a {\it -width signed cut decomposition of }. Here for some and , and the vectors are -valued. To store , we may pack bits, and require only floating point numbers. As a function of , exhibits exponential decay when applied to #f32 matrices with i.i.d. entries. Choosing so that has the same memory footprint as a \textit{f16} or \textit{bf16} matrix, the relative error is comparable. Our algorithm yields efficient signed cut decompositions in lines of pseudocode. It reflects a simple modification from a celebrated 1999 paper [1] of Frieze and Kannan. As a first application, we approximate the weight matrices in the open \textit{Mistral-7B-v0.1} Large Language Model to a spatial compression. Remarkably, all remainder matrices have a relative error and the expanded model closely matches \textit{Mistral-7B-v0.1} on the {\it huggingface} leaderboard [2]. Benchmark performance degrades slowly as we reduce the spatial compression from to . We optimize our open source \textit{rust} implementation [3] with \textit{simd} instructions on \textit{avx2} and \textit{avx512} architectures. We also extend our algorithm from matrices to tensors of arbitrary order and use it to compress a picture of the first author's cat Angus.
View on arXiv