114
v1v2 (latest)

Sparse Mamba: Reinforcing Controllability In Structural State Space Models

Main:7 Pages
2 Figures
Bibliography:3 Pages
5 Tables
Abstract

In this work, we introduce the concept of controllability and observability to the Mamba SSM's architecture in our Sparse-Mamba (S-Mamba) for natural language processing (NLP) applications. The structured state space model (SSM) development in recent studies, such as Mamba and Mamba2, outperformed and solved the computational inefficiency of transformers and large language models at small to medium scale. The Mamba SSMs architecture drops the need for attention layers or multilayer perception blocks in transformers. However, current Mamba models lack reinforcement of controllability in state-space equations for computing the AA, BB, CC, and DD matrices at each time step, leading to increased complexity and computational costs. In this paper, we demonstrate a reduction of parameters in comparison to the first published Mamba and Mamba2. We showcase an improvement in perplexity by 5\% and a decrease in training time by 3\% after reinforcing controllability and observability on the original Mamba architecture in our proposed S-Mamba. The controllable n×nn \times n state matrix AA is sparse and it has only nn free parameters. Our novel approach will ensure a controllable system which will be the gate key for Mamba3.

View on arXiv
Comments on this paper