ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2212.00768
19
21

Simplifying and Understanding State Space Models with Diagonal Linear RNNs

1 December 2022
Ankit Gupta
Harsh Mehta
Jonathan Berant
ArXivPDFHTML
Abstract

Sequence models based on linear state spaces (SSMs) have recently emerged as a promising choice of architecture for modeling long range dependencies across various modalities. However, they invariably rely on discretization of a continuous state space, which complicates their presentation and understanding. In this work, we dispose of the discretization step, and propose a model based on vanilla Diagonal Linear RNNs (DLR\mathrm{DLR}DLR). We empirically show that, despite being conceptually much simpler, DLR\mathrm{DLR}DLR is as performant as previously-proposed SSMs on a variety of tasks and benchmarks including Long Range Arena and raw speech classification. Moreover, we characterize the expressivity of SSMs (including DLR\mathrm{DLR}DLR) and attention-based models via a suite of 131313 synthetic sequence-to-sequence tasks involving interactions over tens of thousands of tokens, ranging from simple operations, such as shifting an input sequence, to detecting co-dependent visual features over long spatial ranges in flattened images. We find that while SSMs report near-perfect performance on tasks that can be modeled via few\textit{few}few convolutional kernels, they struggle on tasks requiring many\textit{many}many such kernels and especially when the desired sequence manipulation is context-dependent\textit{context-dependent}context-dependent. Despite these limitations, DLR\mathrm{DLR}DLR reaches high performance on two higher-order reasoning tasks ListOpsSubTrees\mathrm{ListOpsSubTrees}ListOpsSubTrees and PathfinderSegmentation-256\mathrm{PathfinderSegmentation}\text{-}\mathrm{256}PathfinderSegmentation-256 with input lengths 8K8K8K and 65K65K65K respectively, and gives encouraging performance on PathfinderSegmentation-512\mathrm{PathfinderSegmentation}\text{-}\mathrm{512}PathfinderSegmentation-512 with input length 262K262K262K for which attention is not a viable choice.

View on arXiv
Comments on this paper