Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales

Terms and Conditions

Twitter GitHub LinkedIn Bluesky Youtube

© 2026 ResearchTrend.AI, All rights reserved.

Home
Papers
2406.07887
Cited By

An Empirical Study of Mamba-based Language Models

An Empirical Study of Mamba-based Language Models

12 June 2024

Tri Dao

Albert Gu

Ali Hatamizadeh

Deepak Narayanan

Garvit Kulshreshtha

Jan Kautz

Mohammad Shoeybi

Bryan Catanzaro

ArXiv (abs)PDF HTML HuggingFace (1 upvotes)

Papers citing "An Empirical Study of Mamba-based Language Models"

50 / 94 papers shown

PerfMamba: Performance Analysis and Pruning of Selective State Space Models

PerfMamba: Performance Analysis and Pruning of Selective State Space Models

Abdullah Al Asif

Mobina Kashaniyan

321

0

0

28 Nov 2025

Gated KalmaNet: A Fading Memory Layer Through Test-Time Ridge Regression

Gated KalmaNet: A Fading Memory Layer Through Test-Time Ridge Regression

Aditya Chattopadhyay

464

0

0

26 Nov 2025

Nemotron-Flash: Towards Latency-Optimal Hybrid Small Language Models

Nemotron-Flash: Towards Latency-Optimal Hybrid Small Language Models

Matthijs Van Keirsbilck

...

Maksim Khadkevich

Pavlo Molchanov

164

0

0

24 Nov 2025

Selective Rotary Position Embedding

Selective Rotary Position Embedding

Timur Carstensen

Antonio Orvieto

301

0

0

21 Nov 2025

Analysis of heart failure patient trajectories using sequence modeling

Analysis of heart failure patient trajectories using sequence modeling

Annika Rosengren

Martin Lindgren

Christina E. Lundberg

284

0

0

20 Nov 2025

Q-RAG: Long Context Multi-step Retrieval via Value-based Embedder Training

Q-RAG: Long Context Multi-step Retrieval via Value-based Embedder Training

Alexander Anokhin

Egor Vedernikov

Mikhail Burtsev

Trushkov Alexey

180

0

0

10 Nov 2025

Attention and Compression is all you need for Controllably Efficient Language Models

Attention and Compression is all you need for Controllably Efficient Language Models

Rajesh Ranganath

467

0

0

07 Nov 2025

Apriel-H1: Towards Efficient Enterprise Reasoning Models

Apriel-H1: Towards Efficient Enterprise Reasoning Models

Oleksiy Ostapenko

J. Lamy-Poirier

...

Sébastien Paquet

Srinivas Sunkara

Valérie Bécaert

Sathwik Tejaswi Madhusudhan

Torsten Scholak

134

2

0

04 Nov 2025

FlashEVA: Accelerating LLM inference via Efficient Attention

FlashEVA: Accelerating LLM inference via Efficient Attention

Juan Gabriel Kostelec

164

0

0

01 Nov 2025

Kimi Linear: An Expressive, Efficient Attention Architecture

Kimi Linear: An Expressive, Efficient Attention Architecture

...

138

11

0

30 Oct 2025

Alleviating Forgetfulness of Linear Attention by Hybrid Sparse Attention and Contextualized Learnable Token Eviction

Alleviating Forgetfulness of Linear Attention by Hybrid Sparse Attention and Contextualized Learnable Token Eviction

Philip N. Garner

262

0

0

23 Oct 2025

Some Attention is All You Need for Retrieval

Some Attention is All You Need for Retrieval

90

0

0

21 Oct 2025

To Infinity and Beyond: Tool-Use Unlocks Length Generalization in State Space Models

To Infinity and Beyond: Tool-Use Unlocks Length Generalization in State Space Models

Sinead Williamson

164

0

0

16 Oct 2025

CymbaDiff: Structured Spatial Diffusion for Sketch-based 3D Semantic Urban Scene Generation

CymbaDiff: Structured Spatial Diffusion for Sketch-based 3D Semantic Urban Scene Generation

253

0

0

15 Oct 2025

Mamba Can Learn Low-Dimensional Targets In-Context via Test-Time Feature Learning

Mamba Can Learn Low-Dimensional Targets In-Context via Test-Time Feature Learning

209

0

0

14 Oct 2025

Design Principles for Sequence Models via Coefficient Dynamics

Design Principles for Sequence Models via Coefficient Dynamics

Antonio Orvieto

Melanie Zeilinger

Carmen Amo Alonso

100

0

0

10 Oct 2025

Towards Reliable and Practical LLM Security Evaluations via Bayesian Modelling

Towards Reliable and Practical LLM Security Evaluations via Bayesian Modelling

Michael Harries

109

0

0

07 Oct 2025

Hybrid Architectures for Language Models: Systematic Analysis and Design Insights

Hybrid Architectures for Language Models: Systematic Analysis and Design Insights

152

4

0

06 Oct 2025

Compressed Convolutional Attention: Efficient Attention in a Compressed Latent Space

Compressed Convolutional Attention: Efficient Attention in a Compressed Latent Space

Tomás Figliolia

Nicholas Alonso

Quentin Anthony

136

1

0

06 Oct 2025

Can Mamba Learn In Context with Outliers? A Theoretical Generalization Analysis

Can Mamba Learn In Context with Outliers? A Theoretical Generalization Analysis

153

1

0

01 Oct 2025

TTT3R: 3D Reconstruction as Test-Time Training

TTT3R: 3D Reconstruction as Test-Time Training

276

16

0

30 Sep 2025

MemMamba: Rethinking Memory Patterns in State Space Model

MemMamba: Rethinking Memory Patterns in State Space Model

Yangjingyi Chen

162

0

0

28 Sep 2025

Structured Sparse Transition Matrices to Enable State Tracking in State-Space Models

Structured Sparse Transition Matrices to Enable State Tracking in State-Space Models

Aleksandar Terzić

Michael Hersche

180

0

0

26 Sep 2025

StateX: Enhancing RNN Recall via Post-training State Expansion

StateX: Enhancing RNN Recall via Post-training State Expansion

98

0

0

26 Sep 2025

Achilles' Heel of Mamba: Essential difficulties of the Mamba architecture demonstrated by synthetic data

Achilles' Heel of Mamba: Essential difficulties of the Mamba architecture demonstrated by synthetic data

Zhi-Qin John Xu

172

0

0

22 Sep 2025

TreeGPT: Pure TreeFFN Encoder-Decoder Architecture for Structured Reasoning Without Attention Mechanisms

TreeGPT: Pure TreeFFN Encoder-Decoder Architecture for Structured Reasoning Without Attention Mechanisms

153

1

0

06 Sep 2025

Revisiting associative recall in modern recurrent models

Revisiting associative recall in modern recurrent models

Destiny Okpekpe

Antonio Orvieto

135

0

0

26 Aug 2025

Characterizing the Behavior of Training Mamba-based State Space Models on GPUs

Characterizing the Behavior of Training Mamba-based State Space Models on GPUs

Trinayan Baruah

Kaustubh Shivdikar

91

1

0

25 Aug 2025

Jet-Nemotron: Efficient Language Model with Post Neural Architecture Search

Jet-Nemotron: Efficient Language Model with Post Neural Architecture Search

255

14

0

21 Aug 2025

NVIDIA Nemotron Nano 2: An Accurate and Efficient Hybrid Mamba-Transformer Reasoning Model

NVIDIA Nemotron Nano 2: An Accurate and Efficient Hybrid Mamba-Transformer Reasoning Model

Abhijit Khairnar

Abhijit Paithankar

Abhinav Khattar

...

Keshav Santhanam

Krzysztof Pawelec

298

0

0

20 Aug 2025

Fake-Mamba: Real-Time Speech Deepfake Detection Using Bidirectional Mamba as Self-Attention's Alternative

Fake-Mamba: Real-Time Speech Deepfake Detection Using Bidirectional Mamba as Self-Attention's Alternative

187

3

0

12 Aug 2025

Smooth Reading: Bridging the Gap of Recurrent LLM to Self-Attention LLM on Long-Context Tasks

Smooth Reading: Bridging the Gap of Recurrent LLM to Self-Attention LLM on Long-Context Tasks

149

1

0

25 Jul 2025

Scaling Linear Attention with Sparse State Expansion

Scaling Linear Attention with Sparse State Expansion

283

0

0

22 Jul 2025

ParallelTime: Dynamically Weighting the Balance of Short- and Long-Term Temporal Dependencies

ParallelTime: Dynamically Weighting the Balance of Short- and Long-Term Temporal Dependencies

Aryeh Kontorovich

156

0

0

18 Jul 2025

Characterizing State Space Model (SSM) and SSM-Transformer Hybrid Language Model Performance with Long Context Length

Characterizing State Space Model (SSM) and SSM-Transformer Hybrid Language Model Performance with Long Context Length

Saptarshi Mitra

295

2

0

16 Jul 2025

Lizard: An Efficient Linearization Framework for Large Language Models

Lizard: An Efficient Linearization Framework for Large Language Models

Chien Van Nguyen

Hanieh Deilamsalehy

...

Franck Dernoncourt

247

2

0

11 Jul 2025

Differential Mamba

Differential Mamba

Nadav Schneider

Itamar Zimerman

336

1

0

08 Jul 2025

Understanding and Improving Length Generalization in Recurrent Models

Understanding and Improving Length Generalization in Recurrent Models

Ricardo Buitrago Ruiz

250

4

0

03 Jul 2025

Overcoming Long-Context Limitations of State-Space Models via Context-Dependent Sparse Attention

Overcoming Long-Context Limitations of State-Space Models via Context-Dependent Sparse Attention

221

1

0

01 Jul 2025

TransXSSM: A Hybrid Transformer State Space Model with Unified Rotary Position Embedding

TransXSSM: A Hybrid Transformer State Space Model with Unified Rotary Position Embedding

325

6

0

11 Jun 2025

On-the-Fly Adaptive Distillation of Transformer to Dual-State Linear Attention

On-the-Fly Adaptive Distillation of Transformer to Dual-State Linear Attention

414

2

0

11 Jun 2025

Blending Complementary Memory Systems in Hybrid Quadratic-Linear Transformers

Blending Complementary Memory Systems in Hybrid Quadratic-Linear Transformers

Samuel J. Gershman

221

6

0

31 May 2025

LoLA: Low-Rank Linear Attention With Sparse Caching

LoLA: Low-Rank Linear Attention With Sparse Caching

Robert W. Heath Jr.

339

4

0

29 May 2025

Zebra-Llama: Towards Extremely Efficient Hybrid Models

Zebra-Llama: Towards Extremely Efficient Hybrid Models

Mehdi Rezagholizadeh

228

4

0

22 May 2025

Mechanistic evaluation of Transformers and state space models

Mechanistic evaluation of Transformers and state space models

Nikil Roashan Selvam

Róbert Csordás

Christopher Potts

408

3

0

21 May 2025

Safe-Sora: Safe Text-to-Video Generation via Graphical Watermarking

Safe-Sora: Safe Text-to-Video Generation via Graphical Watermarking

402

1

0

19 May 2025

Block-Biased Mamba for Long-Range Sequence Processing

Block-Biased Mamba for Long-Range Sequence Processing

N. Benjamin Erichson

350

3

0

13 May 2025

Overflow Prevention Enhances Long-Context Recurrent LLMs

Overflow Prevention Enhances Long-Context Recurrent LLMs

Itamar Zimerman

M. Jehanzeb Mirza

Leonid Karlinsky

401

3

0

12 May 2025

Hardware-aligned Hierarchical Sparse Attention for Efficient Long-term Memory Access

Hardware-aligned Hierarchical Sparse Attention for Efficient Long-term Memory Access

434

1

0

23 Apr 2025

Understanding the Skill Gap in Recurrent Language Models: The Role of the Gather-and-Aggregate Mechanism

Understanding the Skill Gap in Recurrent Language Models: The Role of the Gather-and-Aggregate Mechanism

444

4

0

22 Apr 2025