v1v2 (latest)

Sparse Mamba: Reinforcing Controllability In Structural State Space Models

31 August 2024

Main:7 Pages

2 Figures

Bibliography:3 Pages

5 Tables

Abstract

In this work, we introduce the concept of controllability and observability to the Mamba SSM's architecture in our Sparse-Mamba (S-Mamba) for natural language processing (NLP) applications. The structured state space model (SSM) development in recent studies, such as Mamba and Mamba2, outperformed and solved the computational inefficiency of transformers and large language models at small to medium scale. The Mamba SSMs architecture drops the need for attention layers or multilayer perception blocks in transformers. However, current Mamba models lack reinforcement of controllability in state-space equations for computing the $A$ , $B$ , $C$ , and $D$ matrices at each time step, leading to increased complexity and computational costs. In this paper, we demonstrate a reduction of parameters in comparison to the first published Mamba and Mamba2. We showcase an improvement in perplexity by 5\% and a decrease in training time by 3\% after reinforcing controllability and observability on the original Mamba architecture in our proposed S-Mamba. The controllable $n \times n$ state matrix $A$ is sparse and it has only $n$ free parameters. Our novel approach will ensure a controllable system which will be the gate key for Mamba3.

View on arXiv

Comments on this paper