Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2406.07887
Cited By
An Empirical Study of Mamba-based Language Models
12 June 2024
R. Waleffe
Wonmin Byeon
Duncan Riach
Brandon Norick
V. Korthikanti
Tri Dao
Albert Gu
Ali Hatamizadeh
Sudhakar Singh
Deepak Narayanan
Garvit Kulshreshtha
Vartika Singh
Jared Casper
Jan Kautz
M. Shoeybi
Bryan Catanzaro
Re-assign community
ArXiv
PDF
HTML
Papers citing
"An Empirical Study of Mamba-based Language Models"
49 / 49 papers shown
Title
Random Long-Context Access for Mamba via Hardware-aligned Hierarchical Sparse Attention
Xiang Hu
Jiaqi Leng
Jun Zhao
Kewei Tu
Wei Wu
Mamba
45
0
0
23 Apr 2025
LongMamba: Enhancing Mamba's Long Context Capabilities via Training-Free Receptive Field Enlargement
Zhifan Ye
Kejing Xia
Yonggan Fu
Xin Dong
Jihoon Hong
Xiangchi Yuan
Shizhe Diao
Jan Kautz
Pavlo Molchanov
Yingyan Lin
Mamba
42
3
0
22 Apr 2025
Understanding the Skill Gap in Recurrent Language Models: The Role of the Gather-and-Aggregate Mechanism
Aviv Bick
Eric P. Xing
Albert Gu
RALM
81
0
0
22 Apr 2025
CAT: Circular-Convolutional Attention for Sub-Quadratic Transformers
Yoshihiro Yamada
ViT
21
0
0
09 Apr 2025
The Use of Gaze-Derived Confidence of Inferred Operator Intent in Adjusting Safety-Conscious Haptic Assistance
Jeremy D. Webb
Michael Bowman
Songpo Li
Xiaoli Zhang
34
0
0
04 Apr 2025
TransMamba: Flexibly Switching between Transformer and Mamba
Yixing Li
Ruobing Xie
Zhen Yang
X. Sun
Shuaipeng Li
...
Zhanhui Kang
Yu Cheng
C. Xu
Di Wang
Jie Jiang
Mamba
59
1
0
31 Mar 2025
COSMO: Combination of Selective Memorization for Low-cost Vision-and-Language Navigation
Siqi Zhang
Yanyuan Qiao
Qunbo Wang
Zike Yan
Qi Wu
Zhihua Wei
J. Liu
48
0
0
31 Mar 2025
Quamba2: A Robust and Scalable Post-training Quantization Framework for Selective State Space Models
Hung-Yueh Chiang
Chi-chih Chang
N. Frumkin
Kai-Chiang Wu
Mohamed S. Abdelfattah
Diana Marculescu
MQ
60
0
0
28 Mar 2025
Tiled Flash Linear Attention: More Efficient Linear RNN and xLSTM Kernels
M. Beck
Korbinian Poppel
Phillip Lippe
Sepp Hochreiter
59
1
0
18 Mar 2025
Cosmos-Reason1: From Physical Common Sense To Embodied Reasoning
Nvidia
A. Azzolini
Hannah Brandon
Prithvijit Chattopadhyay
Huayu Chen
...
Yao Xu
X. Yang
Zhuolin Yang
Xiaohui Zeng
Z. Zhang
LM&Ro
LRM
AI4CE
52
5
0
18 Mar 2025
MaTVLM: Hybrid Mamba-Transformer for Efficient Vision-Language Modeling
Yingyue Li
Bencheng Liao
Wenyu Liu
Xinggang Wang
Mamba
58
0
0
17 Mar 2025
xLSTM 7B: A Recurrent LLM for Fast and Efficient Inference
M. Beck
Korbinian Poppel
Phillip Lippe
Richard Kurle
P. Blies
G. Klambauer
Sebastian Böck
Sepp Hochreiter
LRM
40
1
0
17 Mar 2025
Key, Value, Compress: A Systematic Exploration of KV Cache Compression Techniques
Neusha Javidnia
B. Rouhani
F. Koushanfar
55
0
0
14 Mar 2025
Fixed-Point RNNs: From Diagonal to Dense in a Few Iterations
Sajad Movahedi
Felix Sarnthein
Nicola Muca Cirone
Antonio Orvieto
46
2
0
13 Mar 2025
Baichuan-M1: Pushing the Medical Capability of Large Language Models
B. Wang
Haizhou Zhao
Huozhi Zhou
Liang Song
Mingyu Xu
...
Yan Zhang
Yifei Duan
Yuyan Zhou
Zhi-Ming Ma
Z. Wu
LM&MA
ELM
AI4MH
37
4
0
18 Feb 2025
RT-DEMT: A hybrid real-time acupoint detection model combining mamba and transformer
Shilong Yang
Qi Zang
Chulong Zhang
Lingfeng Huang
Yaoqin Xie
Mamba
59
1
0
16 Feb 2025
Expansion Span: Combining Fading Memory and Retrieval in Hybrid State Space Models
Elvis Nunez
L. Zancato
Benjamin Bowman
Aditya Golatkar
W. Xia
Stefano Soatto
73
2
0
17 Dec 2024
LinGen: Towards High-Resolution Minute-Length Text-to-Video Generation with Linear Computational Complexity
Hongjie Wang
Chih-Yao Ma
Yen-Cheng Liu
Ji Hou
Tao Xu
...
Peizhao Zhang
Tingbo Hou
Peter Vajda
N. Jha
Xiaoliang Dai
LMTD
DiffM
VGen
VLM
81
5
0
13 Dec 2024
Marconi: Prefix Caching for the Era of Hybrid LLMs
Rui Pan
Zhuang Wang
Zhen Jia
Can Karakus
Luca Zancato
Tri Dao
Ravi Netravali
Yida Wang
87
4
0
28 Nov 2024
BadScan: An Architectural Backdoor Attack on Visual State Space Models
Om Suhas Deshmukh
Sankalp Nagaonkar
A. Tripathi
Ashish Mishra
Mamba
74
0
0
26 Nov 2024
Hymba: A Hybrid-head Architecture for Small Language Models
Xin Dong
Y. Fu
Shizhe Diao
Wonmin Byeon
Zijia Chen
...
Min-Hung Chen
Yoshi Suhara
Y. Lin
Jan Kautz
Pavlo Molchanov
Mamba
97
21
0
20 Nov 2024
A Mamba Foundation Model for Time Series Forecasting
Haoyu Ma
Yushu Chen
Wenlai Zhao
Jinzhe Yang
Yingsheng Ji
Xinghua Xu
Xiaozhu Liu
Hao Jing
Shengzhuo Liu
Guangwen Yang
AI4TS
Mamba
39
1
0
05 Nov 2024
NIMBA: Towards Robust and Principled Processing of Point Clouds With SSMs
Nursena Köprücü
Destiny Okpekpe
Antonio Orvieto
Mamba
28
1
0
31 Oct 2024
Taipan: Efficient and Expressive State Space Language Models with Selective Attention
Chien Van Nguyen
Huy Huu Nguyen
Thang M. Pham
Ruiyi Zhang
Hanieh Deilamsalehy
...
Ryan A. Rossi
Trung Bui
Viet Dac Lai
Franck Dernoncourt
Thien Huu Nguyen
Mamba
RALM
29
1
0
24 Oct 2024
Long-LRM: Long-sequence Large Reconstruction Model for Wide-coverage Gaussian Splats
Chen Ziwen
Hao Tan
Kai Zhang
Sai Bi
Fujun Luan
Yicong Hong
Li Fuxin
Zexiang Xu
3DGS
3DV
26
16
0
16 Oct 2024
Mimetic Initialization Helps State Space Models Learn to Recall
Asher Trockman
Hrayr Harutyunyan
J. Zico Kolter
Sanjiv Kumar
Srinadh Bhojanapalli
Mamba
21
3
0
14 Oct 2024
MatMamba: A Matryoshka State Space Model
Abhinav Shukla
Sai H. Vemprala
Aditya Kusupati
Ashish Kapoor
Mamba
28
0
0
09 Oct 2024
Rodimus*: Breaking the Accuracy-Efficiency Trade-Off with Efficient Attentions
Zhihao He
Hang Yu
Zi Gong
Shizhan Liu
Jianguo Li
Weiyao Lin
VLM
36
1
0
09 Oct 2024
Falcon Mamba: The First Competitive Attention-free 7B Language Model
Jingwei Zuo
Maksim Velikanov
Dhia Eddine Rhaiem
Ilyas Chahed
Younes Belkada
Guillaume Kunsch
Hakim Hacid
ALM
52
12
0
07 Oct 2024
Can Mamba Always Enjoy the "Free Lunch"?
Ruifeng Ren
Zhicong Li
Yong Liu
39
1
0
04 Oct 2024
How to Train Long-Context Language Models (Effectively)
Tianyu Gao
Alexander Wettig
Howard Yen
Danqi Chen
RALM
69
37
0
03 Oct 2024
Inference-Friendly Models With MixAttention
Shashank Rajput
Ying Sheng
Sean Owen
Vitaliy Chiley
74
1
0
23 Sep 2024
Protein-Mamba: Biological Mamba Models for Protein Function Prediction
Bohao Xu
Yingzhou Lu
Yoshitaka Inoue
Namkyeong Lee
Tianfan Fu
Jintai Chen
Mamba
24
1
0
22 Sep 2024
Flash STU: Fast Spectral Transform Units
Y. Isabel Liu
Windsor Nguyen
Yagiz Devre
Evan Dogariu
Anirudha Majumdar
Elad Hazan
AI4TS
61
1
0
16 Sep 2024
A Cost-Aware Approach to Adversarial Robustness in Neural Networks
Charles Meyers
Mohammad Reza Saleh Sedghpour
Tommy Löfstedt
Erik Elmroth
OOD
AAML
24
0
0
11 Sep 2024
Gated Slot Attention for Efficient Linear-Time Sequence Modeling
Yu Zhang
Songlin Yang
Ruijie Zhu
Yue Zhang
Leyang Cui
...
Freda Shi
Bailin Wang
Wei Bi
P. Zhou
Guohong Fu
60
15
0
11 Sep 2024
Jamba-1.5: Hybrid Transformer-Mamba Models at Scale
Jamba Team
Barak Lenz
Alan Arazi
Amir Bergman
Avshalom Manevich
...
Yehoshua Cohen
Yonatan Belinkov
Y. Globerson
Yuval Peleg Levy
Y. Shoham
29
26
0
22 Aug 2024
Kraken: Inherently Parallel Transformers For Efficient Multi-Device Inference
R. Prabhakar
Hengrui Zhang
D. Wentzlaff
23
0
0
14 Aug 2024
BioMamba: A Pre-trained Biomedical Language Representation Model Leveraging Mamba
Ling Yue
Sixue Xing
Yingzhou Lu
Tianfan Fu
Mamba
AI4CE
24
7
0
05 Aug 2024
Speech Slytherin: Examining the Performance and Efficiency of Mamba for Speech Separation, Recognition, and Synthesis
Xilin Jiang
Yinghao Aaron Li
Adrian Nicolas Florea
Cong Han
N. Mesgarani
Mamba
38
9
0
13 Jul 2024
FlashAttention-3: Fast and Accurate Attention with Asynchrony and Low-precision
Jay Shah
Ganesh Bikshandi
Ying Zhang
Vijay Thakkar
Pradeep Ramani
Tri Dao
48
112
0
11 Jul 2024
Mamba-FSCIL: Dynamic Adaptation with Selective State Space Model for Few-Shot Class-Incremental Learning
Xiaojie Li
Yibo Yang
Jianlong Wu
Bernard Ghanem
Liqiang Nie
Min Zhang
Mamba
36
5
0
08 Jul 2024
OTCE: Hybrid SSM and Attention with Cross Domain Mixture of Experts to construct Observer-Thinker-Conceiver-Expresser
Jingze Shi
Ting Xie
Bingheng Wu
Chunjun Zheng
Kai Wang
20
2
0
24 Jun 2024
Griffin: Mixing Gated Linear Recurrences with Local Attention for Efficient Language Models
Soham De
Samuel L. Smith
Anushan Fernando
Aleksandar Botev
George-Christian Muraru
...
David Budden
Yee Whye Teh
Razvan Pascanu
Nando de Freitas
Çağlar Gülçehre
Mamba
53
116
0
29 Feb 2024
Nemotron-4 15B Technical Report
Jupinder Parmar
Shrimai Prabhumoye
Joseph Jennings
M. Patwary
Sandeep Subramanian
...
Ashwath Aithal
Oleksii Kuchaiev
M. Shoeybi
Jonathan Cohen
Bryan Catanzaro
31
21
0
26 Feb 2024
Repeat After Me: Transformers are Better than State Space Models at Copying
Samy Jelassi
David Brandfonbrener
Sham Kakade
Eran Malach
95
77
0
01 Feb 2024
Zoology: Measuring and Improving Recall in Efficient Language Models
Simran Arora
Sabri Eyuboglu
Aman Timalsina
Isys Johnson
Michael Poli
James Zou
Atri Rudra
Christopher Ré
56
65
0
08 Dec 2023
Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism
M. Shoeybi
M. Patwary
Raul Puri
P. LeGresley
Jared Casper
Bryan Catanzaro
MoE
243
1,791
0
17 Sep 2019
PubMedQA: A Dataset for Biomedical Research Question Answering
Qiao Jin
Bhuwan Dhingra
Zhengping Liu
William W. Cohen
Xinghua Lu
202
791
0
13 Sep 2019
1