68

Towards Sampling Data Structures for Tensor Products in Turnstile Streams

Main:13 Pages
Bibliography:10 Pages
Abstract

This paper studies the computational challenges of large-scale attention-based models in artificial intelligence by utilizing importance sampling methods in the streaming setting. Inspired by the classical definition of the 2\ell_2 sampler and the recent progress of the attention scheme in Large Language Models (LLMs), we propose the definition of the attention sampler. Our approach significantly reduces the computational burden of traditional attention mechanisms. We analyze the effectiveness of the attention sampler from a theoretical perspective, including space and update time. Additionally, our framework exhibits scalability and broad applicability across various model architectures and domains.

View on arXiv
Comments on this paper