ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2406.07887
  4. Cited By
An Empirical Study of Mamba-based Language Models

An Empirical Study of Mamba-based Language Models

12 June 2024
R. Waleffe
Wonmin Byeon
Duncan Riach
Brandon Norick
V. Korthikanti
Tri Dao
Albert Gu
Ali Hatamizadeh
Sudhakar Singh
Deepak Narayanan
Garvit Kulshreshtha
Vartika Singh
Jared Casper
Jan Kautz
Mohammad Shoeybi
Bryan Catanzaro
ArXiv (abs)PDFHTMLHuggingFace (1 upvotes)

Papers citing "An Empirical Study of Mamba-based Language Models"

50 / 94 papers shown
Title
PerfMamba: Performance Analysis and Pruning of Selective State Space Models
PerfMamba: Performance Analysis and Pruning of Selective State Space Models
Abdullah Al Asif
Mobina Kashaniyan
Sixing Yu
J. P. Muñoz
Ali Jannesari
Mamba
270
0
0
28 Nov 2025
Gated KalmaNet: A Fading Memory Layer Through Test-Time Ridge Regression
Gated KalmaNet: A Fading Memory Layer Through Test-Time Ridge Regression
Liangzu Peng
Aditya Chattopadhyay
Luca Zancato
Elvis Nunez
Wei Xia
Stefano Soatto
434
0
0
26 Nov 2025
Nemotron-Flash: Towards Latency-Optimal Hybrid Small Language Models
Nemotron-Flash: Towards Latency-Optimal Hybrid Small Language Models
Y. Fu
Xin Dong
Shizhe Diao
Matthijs Van Keirsbilck
Hanrong Ye
...
Maksim Khadkevich
A. Keller
Jan Kautz
Y. Lin
Pavlo Molchanov
138
0
0
24 Nov 2025
Selective Rotary Position Embedding
Selective Rotary Position Embedding
Sajad Movahedi
Timur Carstensen
Arshia Afzal
Frank Hutter
Antonio Orvieto
Volkan Cevher
225
0
0
21 Nov 2025
Analysis of heart failure patient trajectories using sequence modeling
Analysis of heart failure patient trajectories using sequence modeling
Falk Dippela
Yinan Yu
Annika Rosengren
Martin Lindgren
Christina E. Lundberg
Erik Aerts
Martin Adiels
Helen Sjöland
Mamba
267
0
0
20 Nov 2025
Q-RAG: Long Context Multi-step Retrieval via Value-based Embedder Training
Q-RAG: Long Context Multi-step Retrieval via Value-based Embedder Training
A. Sorokin
N. Buzun
Alexander Anokhin
Oleg Inozemcev
Egor Vedernikov
Petr Anokhin
Mikhail Burtsev
Trushkov Alexey
Yin Wenshuai
Evgeny Burnaev
RALM
123
0
0
10 Nov 2025
Attention and Compression is all you need for Controllably Efficient Language Models
Attention and Compression is all you need for Controllably Efficient Language Models
Jatin Prakash
N. Jethani
Rajesh Ranganath
MQVLM
442
0
0
07 Nov 2025
Apriel-H1: Towards Efficient Enterprise Reasoning Models
Apriel-H1: Towards Efficient Enterprise Reasoning Models
Oleksiy Ostapenko
Luke Kumar
Raymond Li
Denis Kocetkov
J. Lamy-Poirier
...
Sébastien Paquet
Srinivas Sunkara
Valérie Bécaert
Sathwik Tejaswi Madhusudhan
Torsten Scholak
LRM
124
1
0
04 Nov 2025
FlashEVA: Accelerating LLM inference via Efficient Attention
FlashEVA: Accelerating LLM inference via Efficient Attention
Juan Gabriel Kostelec
Qinghai Guo
147
0
0
01 Nov 2025
Kimi Linear: An Expressive, Efficient Attention Architecture
Kimi Linear: An Expressive, Efficient Attention Architecture
Kimi Team
Yu Zhang
Zongyu Lin
Xingcheng Yao
J. Hu
...
Guokun Lai
Yuxin Wu
Xinyu Zhou
Zhilin Yang
Yulun Du
116
6
0
30 Oct 2025
Alleviating Forgetfulness of Linear Attention by Hybrid Sparse Attention and Contextualized Learnable Token Eviction
Alleviating Forgetfulness of Linear Attention by Hybrid Sparse Attention and Contextualized Learnable Token Eviction
Mutian He
Philip N. Garner
CLL
240
0
0
23 Oct 2025
Some Attention is All You Need for Retrieval
Some Attention is All You Need for Retrieval
Felix Michalak
Steven Abreu
78
0
0
21 Oct 2025
To Infinity and Beyond: Tool-Use Unlocks Length Generalization in State Space Models
To Infinity and Beyond: Tool-Use Unlocks Length Generalization in State Space Models
Eran Malach
Omid Saremi
Sinead Williamson
Arwen Bradley
Aryo Lotfi
Emmanuel Abbe
J. Susskind
Etai Littwin
136
0
0
16 Oct 2025
CymbaDiff: Structured Spatial Diffusion for Sketch-based 3D Semantic Urban Scene Generation
CymbaDiff: Structured Spatial Diffusion for Sketch-based 3D Semantic Urban Scene Generation
Li Liang
Bo Miao
Xinyu Wang
Naveed Akhtar
Jordan Vice
Ajmal Mian
219
0
0
15 Oct 2025
Mamba Can Learn Low-Dimensional Targets In-Context via Test-Time Feature Learning
Mamba Can Learn Low-Dimensional Targets In-Context via Test-Time Feature Learning
Junsoo Oh
Wei Huang
Taiji Suzuki
196
0
0
14 Oct 2025
Design Principles for Sequence Models via Coefficient Dynamics
Design Principles for Sequence Models via Coefficient Dynamics
Jerome Sieber
Antonio Orvieto
Melanie Zeilinger
Carmen Amo Alonso
68
0
0
10 Oct 2025
Towards Reliable and Practical LLM Security Evaluations via Bayesian Modelling
Towards Reliable and Practical LLM Security Evaluations via Bayesian Modelling
Mary Llewellyn
Annie Gray
Josh Collyer
Michael Harries
96
0
0
07 Oct 2025
Hybrid Architectures for Language Models: Systematic Analysis and Design Insights
Hybrid Architectures for Language Models: Systematic Analysis and Design Insights
Sangmin Bae
Bilge Acun
Haroun Habeeb
S. Kim
Chien-Yu Lin
Liang Luo
Junjie Wang
Carole-Jean Wu
136
4
0
06 Oct 2025
Compressed Convolutional Attention: Efficient Attention in a Compressed Latent Space
Compressed Convolutional Attention: Efficient Attention in a Compressed Latent Space
Tomás Figliolia
Nicholas Alonso
Rishi Iyer
Quentin Anthony
Beren Millidge
MQ
112
1
0
06 Oct 2025
Can Mamba Learn In Context with Outliers? A Theoretical Generalization Analysis
Can Mamba Learn In Context with Outliers? A Theoretical Generalization Analysis
Hongkang Li
Songtao Lu
Xiaodong Cui
Pin-Yu Chen
Meng Wang
MLT
116
1
0
01 Oct 2025
TTT3R: 3D Reconstruction as Test-Time Training
TTT3R: 3D Reconstruction as Test-Time Training
Xingyu Chen
Yue Chen
Yuliang Xiu
Andreas Geiger
Anpei Chen
3DV
229
13
0
30 Sep 2025
MemMamba: Rethinking Memory Patterns in State Space Model
MemMamba: Rethinking Memory Patterns in State Space Model
Youjin Wang
Yangjingyi Chen
Jiahao Yan
Jiaxuan Lu
Xiao Sun
Mamba
144
0
0
28 Sep 2025
StateX: Enhancing RNN Recall via Post-training State Expansion
StateX: Enhancing RNN Recall via Post-training State Expansion
Xingyu Shen
Yingfa Chen
Zhen Leng Thai
Xu Han
Zhiyuan Liu
Maosong Sun
84
0
0
26 Sep 2025
Structured Sparse Transition Matrices to Enable State Tracking in State-Space Models
Structured Sparse Transition Matrices to Enable State Tracking in State-Space Models
Aleksandar Terzić
Nicolas Menet
Michael Hersche
Thomas Hofmann
Abbas Rahimi
139
0
0
26 Sep 2025
Achilles' Heel of Mamba: Essential difficulties of the Mamba architecture demonstrated by synthetic data
Achilles' Heel of Mamba: Essential difficulties of the Mamba architecture demonstrated by synthetic data
Tianyi Chen
Pengxiao Lin
Zhiwei Wang
Zhi-Qin John Xu
Mamba
138
0
0
22 Sep 2025
TreeGPT: Pure TreeFFN Encoder-Decoder Architecture for Structured Reasoning Without Attention Mechanisms
TreeGPT: Pure TreeFFN Encoder-Decoder Architecture for Structured Reasoning Without Attention Mechanisms
Zixi Li
128
1
0
06 Sep 2025
Revisiting associative recall in modern recurrent models
Revisiting associative recall in modern recurrent models
Destiny Okpekpe
Antonio Orvieto
88
4
0
26 Aug 2025
Characterizing the Behavior of Training Mamba-based State Space Models on GPUs
Characterizing the Behavior of Training Mamba-based State Space Models on GPUs
Trinayan Baruah
Kaustubh Shivdikar
Sara Prescott
David Kaeli
Mamba
73
1
0
25 Aug 2025
Jet-Nemotron: Efficient Language Model with Post Neural Architecture Search
Jet-Nemotron: Efficient Language Model with Post Neural Architecture Search
Yuxian Gu
Qinghao Hu
Shang Yang
Haocheng Xi
Junyu Chen
Song Han
Han Cai
224
11
0
21 Aug 2025
NVIDIA Nemotron Nano 2: An Accurate and Efficient Hybrid Mamba-Transformer Reasoning Model
NVIDIA Nemotron Nano 2: An Accurate and Efficient Hybrid Mamba-Transformer Reasoning Model
Nvidia
Aarti Basant
Abhijit Khairnar
Abhijit Paithankar
Abhinav Khattar
...
Keith Wyss
Keshav Santhanam
Kezhi Kong
Krzysztof Pawelec
Kumar Anik
LRM
267
0
0
20 Aug 2025
Fake-Mamba: Real-Time Speech Deepfake Detection Using Bidirectional Mamba as Self-Attention's Alternative
Fake-Mamba: Real-Time Speech Deepfake Detection Using Bidirectional Mamba as Self-Attention's Alternative
Xi Xuan
Zimo Zhu
Wenxin Zhang
Yi-Cheng Lin
Tomi Kinnunen
Mamba
152
2
0
12 Aug 2025
Smooth Reading: Bridging the Gap of Recurrent LLM to Self-Attention LLM on Long-Context Tasks
Smooth Reading: Bridging the Gap of Recurrent LLM to Self-Attention LLM on Long-Context Tasks
Kai Liu
Zhan Su
Peijie Dong
Fengran Mo
Jianfei Gao
ShaoTing Zhang
Kai-xiang Chen
111
1
0
25 Jul 2025
Scaling Linear Attention with Sparse State Expansion
Scaling Linear Attention with Sparse State Expansion
Yuqi Pan
Yongqi An
Zheng Li
Yuhong Chou
Ruijie Zhu
Xiaohui Wang
Mingxuan Wang
Jinqiao Wang
Guoqi Li
243
0
0
22 Jul 2025
ParallelTime: Dynamically Weighting the Balance of Short- and Long-Term Temporal Dependencies
ParallelTime: Dynamically Weighting the Balance of Short- and Long-Term Temporal Dependencies
Itay Katav
Aryeh Kontorovich
AI4TS
120
0
0
18 Jul 2025
Characterizing State Space Model (SSM) and SSM-Transformer Hybrid Language Model Performance with Long Context Length
Characterizing State Space Model (SSM) and SSM-Transformer Hybrid Language Model Performance with Long Context Length
Saptarshi Mitra
Rachid Karami
Haocheng Xu
Sitao Huang
Hyoukjun Kwon
238
1
0
16 Jul 2025
Lizard: An Efficient Linearization Framework for Large Language Models
Lizard: An Efficient Linearization Framework for Large Language Models
Chien Van Nguyen
Ruiyi Zhang
Hanieh Deilamsalehy
Puneet Mathur
Viet Dac Lai
...
Ryan Rossi
Trung H. Bui
N. Vlassis
Franck Dernoncourt
T. Nguyen
KELM
236
2
0
11 Jul 2025
Differential Mamba
Differential Mamba
Nadav Schneider
Itamar Zimerman
Eliya Nachmani
Mamba
291
1
0
08 Jul 2025
Understanding and Improving Length Generalization in Recurrent Models
Understanding and Improving Length Generalization in Recurrent Models
Ricardo Buitrago Ruiz
Albert Gu
206
4
0
03 Jul 2025
Overcoming Long-Context Limitations of State-Space Models via Context-Dependent Sparse Attention
Overcoming Long-Context Limitations of State-Space Models via Context-Dependent Sparse Attention
Zhihao Zhan
Jianan Zhao
Zhaocheng Zhu
Jian Tang
207
1
0
01 Jul 2025
TransXSSM: A Hybrid Transformer State Space Model with Unified Rotary Position Embedding
TransXSSM: A Hybrid Transformer State Space Model with Unified Rotary Position Embedding
Yiran Peng
Jingze Shi
Yifan Wu
Nan Tang
Yuyu Luo
302
3
0
11 Jun 2025
On-the-Fly Adaptive Distillation of Transformer to Dual-State Linear Attention
On-the-Fly Adaptive Distillation of Transformer to Dual-State Linear Attention
Yeonju Ro
Zhenyu Zhang
Souvik Kundu
Zhangyang Wang
Aditya Akella
383
2
0
11 Jun 2025
Blending Complementary Memory Systems in Hybrid Quadratic-Linear Transformers
Blending Complementary Memory Systems in Hybrid Quadratic-Linear Transformers
Kazuki Irie
Morris Yau
Samuel J. Gershman
179
5
0
31 May 2025
LoLA: Low-Rank Linear Attention With Sparse Caching
LoLA: Low-Rank Linear Attention With Sparse Caching
Luke McDermott
Robert W. Heath Jr.
Rahul Parhi
RALM
310
4
0
29 May 2025
Zebra-Llama: Towards Extremely Efficient Hybrid Models
Zebra-Llama: Towards Extremely Efficient Hybrid Models
Mingyu Yang
Mehdi Rezagholizadeh
Guihong Li
Vikram Appia
Emad Barsoum
199
3
0
22 May 2025
Mechanistic evaluation of Transformers and state space models
Mechanistic evaluation of Transformers and state space models
Aryaman Arora
Neil Rathi
Nikil Roashan Selvam
Róbert Csordás
Dan Jurafsky
Christopher Potts
387
3
0
21 May 2025
Safe-Sora: Safe Text-to-Video Generation via Graphical Watermarking
Safe-Sora: Safe Text-to-Video Generation via Graphical Watermarking
Zihan Su
Xuerui Qiu
Hongbin Xu
Tangyu Jiang
Junhao Zhuang
Chun Yuan
Ming Li
Shengfeng He
Fei Richard Yu
WIGM
348
1
0
19 May 2025
Block-Biased Mamba for Long-Range Sequence Processing
Block-Biased Mamba for Long-Range Sequence Processing
Annan Yu
N. Benjamin Erichson
Mamba
303
2
0
13 May 2025
Overflow Prevention Enhances Long-Context Recurrent LLMs
Overflow Prevention Enhances Long-Context Recurrent LLMs
Assaf Ben-Kish
Itamar Zimerman
M. Jehanzeb Mirza
James R. Glass
James Glass
Leonid Karlinsky
Raja Giryes
LRM
330
3
0
12 May 2025
Hardware-aligned Hierarchical Sparse Attention for Efficient Long-term Memory Access
Hardware-aligned Hierarchical Sparse Attention for Efficient Long-term Memory Access
Xiang Hu
Jiaqi Leng
Jun Zhao
Kewei Tu
Wei Wu
Mamba
388
2
0
23 Apr 2025
LongMamba: Enhancing Mamba's Long Context Capabilities via Training-Free Receptive Field Enlargement
LongMamba: Enhancing Mamba's Long Context Capabilities via Training-Free Receptive Field EnlargementInternational Conference on Learning Representations (ICLR), 2025
Zhifan Ye
Kejing Xia
Yonggan Fu
Xin Dong
Jihoon Hong
Xiangchi Yuan
Shizhe Diao
Jan Kautz
Pavlo Molchanov
Yingyan Lin
Mamba
304
21
0
22 Apr 2025
12
Next