ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2406.07887
  4. Cited By
An Empirical Study of Mamba-based Language Models

An Empirical Study of Mamba-based Language Models

12 June 2024
R. Waleffe
Wonmin Byeon
Duncan Riach
Brandon Norick
V. Korthikanti
Tri Dao
Albert Gu
Ali Hatamizadeh
Sudhakar Singh
Deepak Narayanan
Garvit Kulshreshtha
Vartika Singh
Jared Casper
Jan Kautz
Mohammad Shoeybi
Bryan Catanzaro
ArXiv (abs)PDFHTMLHuggingFace (1 upvotes)

Papers citing "An Empirical Study of Mamba-based Language Models"

44 / 94 papers shown
Title
Understanding the Skill Gap in Recurrent Language Models: The Role of the Gather-and-Aggregate Mechanism
Understanding the Skill Gap in Recurrent Language Models: The Role of the Gather-and-Aggregate Mechanism
Aviv Bick
Eric P. Xing
Albert Gu
RALM
415
4
0
22 Apr 2025
CAT: Circular-Convolutional Attention for Sub-Quadratic Transformers
CAT: Circular-Convolutional Attention for Sub-Quadratic Transformers
Yoshihiro Yamada
ViT
255
0
0
09 Apr 2025
The Use of Gaze-Derived Confidence of Inferred Operator Intent in Adjusting Safety-Conscious Haptic Assistance
The Use of Gaze-Derived Confidence of Inferred Operator Intent in Adjusting Safety-Conscious Haptic Assistance
Jeremy D. Webb
Michael Bowman
Songpo Li
Xiaoli Zhang
292
0
0
04 Apr 2025
COSMO: Combination of Selective Memorization for Low-cost Vision-and-Language Navigation
COSMO: Combination of Selective Memorization for Low-cost Vision-and-Language Navigation
Siqi Zhang
Yanyuan Qiao
Qunbo Wang
Zike Yan
Qi Wu
Zhihua Wei
Qingbin Liu
486
3
0
31 Mar 2025
TransMamba: Flexibly Switching between Transformer and Mamba
TransMamba: Flexibly Switching between Transformer and Mamba
Yixing Li
Ruobing Xie
Zhen Yang
Xingwu Sun
Shuaipeng Li
...
Zhanhui Kang
Yu Cheng
C. Xu
Di Wang
Jie Jiang
Mamba
265
7
0
31 Mar 2025
Quamba2: A Robust and Scalable Post-training Quantization Framework for Selective State Space Models
Quamba2: A Robust and Scalable Post-training Quantization Framework for Selective State Space Models
Hung-Yueh Chiang
Chi-chih Chang
N. Frumkin
Kai-Chiang Wu
Mohamed S. Abdelfattah
Diana Marculescu
MQ
1.0K
2
0
28 Mar 2025
Tiled Flash Linear Attention: More Efficient Linear RNN and xLSTM Kernels
Tiled Flash Linear Attention: More Efficient Linear RNN and xLSTM Kernels
M. Beck
Korbinian Poppel
Phillip Lippe
Sepp Hochreiter
410
8
0
18 Mar 2025
Cosmos-Reason1: From Physical Common Sense To Embodied Reasoning
Cosmos-Reason1: From Physical Common Sense To Embodied Reasoning
Nvidia
A. Azzolini
Junjie Bai
Prithvijit Chattopadhyay
Huayu Chen
...
Xiaodong Yang
Zhuolin Yang
Jing Zhang
Xiaohui Zeng
Zhe Zhang
AI4CELM&RoLRM
577
64
0
18 Mar 2025
MaTVLM: Hybrid Mamba-Transformer for Efficient Vision-Language Modeling
MaTVLM: Hybrid Mamba-Transformer for Efficient Vision-Language Modeling
Yingyue Li
Bencheng Liao
Wenyu Liu
Xinggang Wang
Mamba
366
2
0
17 Mar 2025
xLSTM 7B: A Recurrent LLM for Fast and Efficient Inference
xLSTM 7B: A Recurrent LLM for Fast and Efficient Inference
M. Beck
Korbinian Poppel
Phillip Lippe
Richard Kurle
P. Blies
Günter Klambauer
Sebastian Böck
Sepp Hochreiter
LRM
237
10
0
17 Mar 2025
Key, Value, Compress: A Systematic Exploration of KV Cache Compression Techniques
Key, Value, Compress: A Systematic Exploration of KV Cache Compression TechniquesIEEE Custom Integrated Circuits Conference (CICC), 2025
Neusha Javidnia
B. Rouhani
F. Koushanfar
1.1K
3
0
14 Mar 2025
Fixed-Point RNNs: Interpolating from Diagonal to Dense
Fixed-Point RNNs: Interpolating from Diagonal to Dense
Sajad Movahedi
Felix Sarnthein
Nicola Muca Cirone
Antonio Orvieto
414
1
0
13 Mar 2025
Baichuan-M1: Pushing the Medical Capability of Large Language Models
Binghai Wang
Haizhou Zhao
Huozhi Zhou
Liang Song
Mingyu Xu
...
Yan Zhang
Yifei Duan
Yuyan Zhou
Zhi-Ming Ma
Zhikai Wu
LM&MAELMAI4MH
332
31
0
18 Feb 2025
RT-DEMT: A hybrid real-time acupoint detection model combining mamba and transformer
RT-DEMT: A hybrid real-time acupoint detection model combining mamba and transformer
Shilong Yang
Qi Zang
Chulong Zhang
Lingfeng Huang
Yaoqin Xie
Mamba
479
4
0
16 Feb 2025
Adjoint sharding for very long context training of state space models
Xingzi Xu
Amir Tavanaei
Kavosh Asadi
Karim Bouyarmane
181
0
0
03 Jan 2025
Expansion Span: Combining Fading Memory and Retrieval in Hybrid State Space Models
Expansion Span: Combining Fading Memory and Retrieval in Hybrid State Space Models
Elvis Nunez
Luca Zancato
Benjamin Bowman
Aditya Golatkar
Wei Xia
Stefano Soatto
450
7
0
17 Dec 2024
LinGen: Towards High-Resolution Minute-Length Text-to-Video Generation with Linear Computational Complexity
LinGen: Towards High-Resolution Minute-Length Text-to-Video Generation with Linear Computational ComplexityComputer Vision and Pattern Recognition (CVPR), 2024
Hongjie Wang
Chih-Yao Ma
Yen-Cheng Liu
Ji Hou
Tao Xu
...
Peizhao Zhang
Tingbo Hou
Peter Vajda
N. Jha
Xiaoliang Dai
LMTDVGenVLMDiffM
387
25
0
13 Dec 2024
Marconi: Prefix Caching for the Era of Hybrid LLMs
Marconi: Prefix Caching for the Era of Hybrid LLMs
Rui Pan
Zhuang Wang
Zhen Jia
Can Karakus
Luca Zancato
Tri Dao
Ravi Netravali
Yida Wang
517
13
0
28 Nov 2024
BadScan: An Architectural Backdoor Attack on Visual State Space Models
BadScan: An Architectural Backdoor Attack on Visual State Space Models
Om Suhas Deshmukh
Sankalp Nagaonkar
A. Tripathi
Ashish Mishra
Mamba
251
0
0
26 Nov 2024
Hymba: A Hybrid-head Architecture for Small Language Models
Hymba: A Hybrid-head Architecture for Small Language ModelsInternational Conference on Learning Representations (ICLR), 2024
Xin Dong
Y. Fu
Shizhe Diao
Wonmin Byeon
Zijia Chen
...
Min-Hung Chen
Yoshi Suhara
Y. Lin
Jan Kautz
Pavlo Molchanov
Mamba
294
50
0
20 Nov 2024
A Mamba Foundation Model for Time Series Forecasting
A Mamba Foundation Model for Time Series Forecasting
Haoyu Ma
Yushu Chen
Wenlai Zhao
Jinzhe Yang
Yingsheng Ji
Xinghua Xu
Xiaozhu Liu
Hao Jing
Shengzhuo Liu
Guangwen Yang
AI4TSMamba
296
9
0
05 Nov 2024
NIMBA: Towards Robust and Principled Processing of Point Clouds With
  SSMs
NIMBA: Towards Robust and Principled Processing of Point Clouds With SSMs
Nursena Köprücü
Destiny Okpekpe
Antonio Orvieto
Mamba
175
2
0
31 Oct 2024
Long-context Protein Language Modeling Using Bidirectional Mamba with Shared Projection Layers
Long-context Protein Language Modeling Using Bidirectional Mamba with Shared Projection LayersbioRxiv (bioRxiv), 2024
Yingheng Wang
Zichen Wang
Gil Sadeh
Luca Zancato
Alessandro Achille
George Karypis
Huzefa Rangwala
347
4
0
29 Oct 2024
Taipan: Efficient and Expressive State Space Language Models with
  Selective Attention
Taipan: Efficient and Expressive State Space Language Models with Selective Attention
Chien Van Nguyen
Huy Huu Nguyen
Thang M. Pham
Ruiyi Zhang
Hanieh Deilamsalehy
...
Ryan A. Rossi
Trung Bui
Viet Dac Lai
Franck Dernoncourt
Thien Huu Nguyen
MambaRALM
129
2
0
24 Oct 2024
Long-LRM: Long-sequence Large Reconstruction Model for Wide-coverage Gaussian Splats
Long-LRM: Long-sequence Large Reconstruction Model for Wide-coverage Gaussian Splats
Chen Ziwen
Hao Tan
Kai Zhang
Sai Bi
Fujun Luan
Yicong Hong
Li Fuxin
Zexiang Xu
3DGS3DV
418
55
0
16 Oct 2024
Mimetic Initialization Helps State Space Models Learn to Recall
Mimetic Initialization Helps State Space Models Learn to Recall
Asher Trockman
Hrayr Harutyunyan
J. Zico Kolter
Sanjiv Kumar
Srinadh Bhojanapalli
Mamba
120
8
0
14 Oct 2024
MatMamba: A Matryoshka State Space Model
MatMamba: A Matryoshka State Space Model
Abhinav Shukla
Sai H. Vemprala
Aditya Kusupati
Ashish Kapoor
Mamba
203
3
0
09 Oct 2024
Rodimus*: Breaking the Accuracy-Efficiency Trade-Off with Efficient Attentions
Rodimus*: Breaking the Accuracy-Efficiency Trade-Off with Efficient AttentionsInternational Conference on Learning Representations (ICLR), 2024
Zhihao He
Hang Yu
Zi Gong
Shizhan Liu
Jia-Nan Li
Weiyao Lin
VLM
349
4
0
09 Oct 2024
Falcon Mamba: The First Competitive Attention-free 7B Language Model
Falcon Mamba: The First Competitive Attention-free 7B Language Model
Jingwei Zuo
Maksim Velikanov
Dhia Eddine Rhaiem
Ilyas Chahed
Younes Belkada
Guillaume Kunsch
Hakim Hacid
ALM
234
38
0
07 Oct 2024
Exploring the Limitations of Mamba in COPY and CoT Reasoning
Exploring the Limitations of Mamba in COPY and CoT Reasoning
Ruifeng Ren
Zhicong Li
Yong Liu
215
3
0
04 Oct 2024
How to Train Long-Context Language Models (Effectively)
How to Train Long-Context Language Models (Effectively)Annual Meeting of the Association for Computational Linguistics (ACL), 2024
Tianyu Gao
Alexander Wettig
Howard Yen
Danqi Chen
RALM
618
87
0
03 Oct 2024
Inference-Friendly Models With MixAttention
Inference-Friendly Models With MixAttention
Shashank Rajput
Ying Sheng
Sean Owen
Vitaliy Chiley
264
4
0
23 Sep 2024
Protein-Mamba: Biological Mamba Models for Protein Function Prediction
Protein-Mamba: Biological Mamba Models for Protein Function Prediction
Bohao Xu
Yingzhou Lu
Yoshitaka Inoue
Namkyeong Lee
Tianfan Fu
Jintai Chen
Mamba
175
5
0
22 Sep 2024
Flash STU: Fast Spectral Transform Units
Flash STU: Fast Spectral Transform Units
Y. Isabel Liu
Windsor Nguyen
Yagiz Devre
Evan Dogariu
Anirudha Majumdar
Elad Hazan
AI4TS
419
3
0
16 Sep 2024
A Cost-Aware Approach to Adversarial Robustness in Neural Networks
A Cost-Aware Approach to Adversarial Robustness in Neural Networks
Charles Meyers
Mohammad Reza Saleh Sedghpour
Tommy Löfstedt
Erik Elmroth
OODAAML
172
0
0
11 Sep 2024
Gated Slot Attention for Efficient Linear-Time Sequence Modeling
Gated Slot Attention for Efficient Linear-Time Sequence ModelingNeural Information Processing Systems (NeurIPS), 2024
Yu Zhang
Aaron Courville
Ruijie Zhu
Yue Zhang
Leyang Cui
...
Freda Shi
Bailin Wang
Wei Bi
P. Zhou
Guohong Fu
245
48
0
11 Sep 2024
Jamba-1.5: Hybrid Transformer-Mamba Models at Scale
Jamba-1.5: Hybrid Transformer-Mamba Models at Scale
Jamba Team
Barak Lenz
Alan Arazi
Amir Bergman
Avshalom Manevich
...
Yehoshua Cohen
Yonatan Belinkov
Y. Globerson
Yuval Peleg Levy
Y. Shoham
198
47
0
22 Aug 2024
Kraken: Inherently Parallel Transformers For Efficient Multi-Device
  Inference
Kraken: Inherently Parallel Transformers For Efficient Multi-Device InferenceNeural Information Processing Systems (NeurIPS), 2024
R. Prabhakar
Hengrui Zhang
D. Wentzlaff
265
1
0
14 Aug 2024
DyGMamba: Efficiently Modeling Long-Term Temporal Dependency on Continuous-Time Dynamic Graphs with State Space Models
DyGMamba: Efficiently Modeling Long-Term Temporal Dependency on Continuous-Time Dynamic Graphs with State Space Models
Zifeng Ding
Yifeng Li
Yuan He
Antonio Norelli
Jingcheng Wu
Volker Tresp
Yunpu Ma
Michael Bronstein
Mamba
361
11
0
08 Aug 2024
BioMamba: A Pre-trained Biomedical Language Representation Model
  Leveraging Mamba
BioMamba: A Pre-trained Biomedical Language Representation Model Leveraging Mamba
Ling Yue
Sixue Xing
Yingzhou Lu
Tianfan Fu
MambaAI4CE
207
10
0
05 Aug 2024
Speech Slytherin: Examining the Performance and Efficiency of Mamba for
  Speech Separation, Recognition, and Synthesis
Speech Slytherin: Examining the Performance and Efficiency of Mamba for Speech Separation, Recognition, and Synthesis
Xilin Jiang
Yinghao Aaron Li
Adrian Nicolas Florea
Cong Han
N. Mesgarani
Mamba
207
29
0
13 Jul 2024
FlashAttention-3: Fast and Accurate Attention with Asynchrony and
  Low-precision
FlashAttention-3: Fast and Accurate Attention with Asynchrony and Low-precision
Jay Shah
Ganesh Bikshandi
Ying Zhang
Vijay Thakkar
Pradeep Ramani
Tri Dao
466
309
0
11 Jul 2024
Mamba-FSCIL: Dynamic Adaptation with Selective State Space Model for Few-Shot Class-Incremental Learning
Mamba-FSCIL: Dynamic Adaptation with Selective State Space Model for Few-Shot Class-Incremental Learning
Xiaojie Li
Jianlong Wu
Yue Yu
Guohao Li
Ming-Hsuan Yang
Liqiang Nie
M. Zhang
Mamba
426
9
0
08 Jul 2024
OTCE: Hybrid SSM and Attention with Cross Domain Mixture of Experts to
  construct Observer-Thinker-Conceiver-Expresser
OTCE: Hybrid SSM and Attention with Cross Domain Mixture of Experts to construct Observer-Thinker-Conceiver-Expresser
Jingze Shi
Ting Xie
Yiran Peng
Chunjun Zheng
Kai Wang
86
2
0
24 Jun 2024
Previous
12