Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2006.16236
Cited By
Transformers are RNNs: Fast Autoregressive Transformers with Linear Attention
29 June 2020
Angelos Katharopoulos
Apoorv Vyas
Nikolaos Pappas
Franccois Fleuret
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Transformers are RNNs: Fast Autoregressive Transformers with Linear Attention"
50 / 111 papers shown
Title
Balancing Computation Load and Representation Expressivity in Parallel Hybrid Neural Networks
Mohammad Mahdi Moradi
Walid Ahmed
Shuangyue Wen
Sudhir Mudur
Weiwei Zhang
Yang Liu
27
0
0
26 May 2025
Kernel Space Diffusion Model for Efficient Remote Sensing Pansharpening
Hancong Jin
Zihan Cao
Liangjian Deng
DiffM
103
0
0
25 May 2025
Scaling Recurrent Neural Networks to a Billion Parameters with Zero-Order Optimization
Francois Chaubard
Mykel J. Kochenderfer
MQ
AI4CE
113
0
0
23 May 2025
L-MTP: Leap Multi-Token Prediction Beyond Adjacent Context for Large Language Models
Xiaohao Liu
Xiaobo Xia
Weixiang Zhao
Manyi Zhang
Xianzhi Yu
Xiu Su
Shuo Yang
See-Kiong Ng
Tat-Seng Chua
KELM
LRM
60
0
0
23 May 2025
Direct3D-S2: Gigascale 3D Generation Made Easy with Spatial Sparse Attention
Shuang Wu
Youtian Lin
Feihu Zhang
Yifei Zeng
Yikang Yang
...
Jiachen Qian
Siyu Zhu
Xun Cao
Philip Torr
Yao Yao
3DGS
58
0
0
23 May 2025
Zebra-Llama: Towards Extremely Efficient Hybrid Models
Mingyu Yang
Mehdi Rezagholizadeh
Guihong Li
Vikram Appia
Emad Barsoum
38
0
0
22 May 2025
VoiceCloak: A Multi-Dimensional Defense Framework against Unauthorized Diffusion-based Voice Cloning
Qianyue Hu
Junyan Wu
Wei Lu
Xiangyang Luo
DiffM
AAML
40
0
0
18 May 2025
Mixture of Sparse Attention: Content-Based Learnable Sparse Attention via Expert-Choice Routing
Piotr Piekos
Róbert Csordás
Jürgen Schmidhuber
MoE
VLM
171
2
0
01 May 2025
Vision Mamba in Remote Sensing: A Comprehensive Survey of Techniques, Applications and Outlook
Muyi Bao
Shuchang Lyu
Zhaoyang Xu
Huiyu Zhou
Jinchang Ren
Shiming Xiang
Xuelong Li
Guangliang Cheng
Mamba
158
0
0
01 May 2025
RWKV-X: A Linear Complexity Hybrid Language Model
Haowen Hou
Zhiyi Huang
Kaifeng Tan
Rongchang Lu
Fei Richard Yu
VLM
108
0
0
30 Apr 2025
Token-Shuffle: Towards High-Resolution Image Generation with Autoregressive Models
Xu Ma
Peize Sun
Haoyu Ma
Hao Tang
Chih-Yao Ma
...
Matt Feiszli
Peizhao Zhang
Peter Vajda
Sam S. Tsai
Y. Fu
100
2
0
24 Apr 2025
Empirical Evaluation of Knowledge Distillation from Transformers to Subquadratic Language Models
Patrick Haller
Jonas Golde
Alan Akbik
67
0
0
19 Apr 2025
A Survey on Music Generation from Single-Modal, Cross-Modal, and Multi-Modal Perspectives
Shuyu Li
Shulei Ji
Zihao Wang
Songruoyao Wu
Jiaxing Yu
Kai Zhang
MGen
VGen
153
1
0
01 Apr 2025
Tiled Flash Linear Attention: More Efficient Linear RNN and xLSTM Kernels
M. Beck
Korbinian Poppel
Phillip Lippe
Sepp Hochreiter
104
1
0
18 Mar 2025
Linear-MoE: Linear Sequence Modeling Meets Mixture-of-Experts
Weigao Sun
Disen Lan
Tong Zhu
Xiaoye Qu
Yu Cheng
MoE
169
2
0
07 Mar 2025
Predicting Team Performance from Communications in Simulated Search-and-Rescue
Ali Jalal-Kamali
Nikolos Gurney
David Pynadath
AI4TS
133
14
0
05 Mar 2025
DivPrune: Diversity-based Visual Token Pruning for Large Multimodal Models
Saeed Ranjbar Alvar
Gursimran Singh
Mohammad Akbari
Yong Zhang
VLM
127
0
0
04 Mar 2025
Liger: Linearizing Large Language Models to Gated Recurrent Structures
Disen Lan
Weigao Sun
Jiaxi Hu
Jusen Du
Yu Cheng
92
0
0
03 Mar 2025
Position: Solve Layerwise Linear Models First to Understand Neural Dynamical Phenomena (Neural Collapse, Emergence, Lazy/Rich Regime, and Grokking)
Yoonsoo Nam
Seok Hyeong Lee
Clementine Domine
Yea Chan Park
Charles London
Wonyl Choi
Niclas Goring
Seungjai Lee
AI4CE
109
0
0
28 Feb 2025
Single-Channel EEG Tokenization Through Time-Frequency Modeling
Jathurshan Pradeepkumar
Xihao Piao
Zheng Chen
Jimeng Sun
85
1
0
22 Feb 2025
Lightweight yet Efficient: An External Attentive Graph Convolutional Network with Positional Prompts for Sequential Recommendation
Jinyu Zhang
Chao Li
Zhongying Zhao
108
0
0
21 Feb 2025
A Survey of Model Architectures in Information Retrieval
Zhichao Xu
Fengran Mo
Zhiqi Huang
Crystina Zhang
Puxuan Yu
Bei Wang
Jimmy J. Lin
Vivek Srikumar
KELM
3DV
106
2
0
21 Feb 2025
Enhancing RWKV-based Language Models for Long-Sequence Text Generation
Xinghan Pan
83
0
0
21 Feb 2025
MoM: Linear Sequence Modeling with Mixture-of-Memories
Jusen Du
Weigao Sun
Disen Lan
Jiaxi Hu
Yu Cheng
KELM
108
3
0
19 Feb 2025
Associative Recurrent Memory Transformer
Ivan Rodkin
Yuri Kuratov
Aydar Bulatov
Andrey Kravchenko
91
3
0
17 Feb 2025
Twilight: Adaptive Attention Sparsity with Hierarchical Top-
p
p
p
Pruning
C. Lin
Jiaming Tang
Shuo Yang
Hanshuo Wang
Tian Tang
Boyu Tian
Ion Stoica
Enze Xie
Mingyu Gao
110
2
0
04 Feb 2025
Generalization Error Analysis for Selective State-Space Models Through the Lens of Attention
Arya Honarpisheh
Mustafa Bozdag
Octavia Camps
Mario Sznaier
Mamba
105
1
0
03 Feb 2025
Explaining Context Length Scaling and Bounds for Language Models
Jingzhe Shi
Qinwei Ma
Hongyi Liu
Hang Zhao
Jeng-Neng Hwang
Lei Li
LRM
169
3
0
03 Feb 2025
Mamba-Shedder: Post-Transformer Compression for Efficient Selective Structured State Space Models
J. P. Muñoz
Jinjie Yuan
Nilesh Jain
Mamba
99
1
0
28 Jan 2025
State-space models are accurate and efficient neural operators for dynamical systems
Zheyuan Hu
Nazanin Ahmadi Daryakenari
Qianli Shen
Kenji Kawaguchi
George Karniadakis
Mamba
AI4CE
124
16
0
28 Jan 2025
PolaFormer: Polarity-aware Linear Attention for Vision Transformers
Weikang Meng
Yadan Luo
Xin Li
D. Jiang
Zheng Zhang
386
2
0
25 Jan 2025
ELITR-Bench: A Meeting Assistant Benchmark for Long-Context Language Models
Thibaut Thonet
Jos Rozen
Laurent Besacier
RALM
178
2
0
20 Jan 2025
Generative Retrieval for Book search
Yubao Tang
Ruqing Zhang
Jiafeng Guo
Maarten de Rijke
Shihao Liu
Shuaiqiang Wang
Dawei Yin
Xueqi Cheng
RALM
94
0
0
19 Jan 2025
Towards Scalable and Stable Parallelization of Nonlinear RNNs
Xavier Gonzalez
Andrew Warrington
Jimmy T.H. Smith
Scott W. Linderman
157
10
0
17 Jan 2025
Tensor Product Attention Is All You Need
Yifan Zhang
Yifeng Liu
Huizhuo Yuan
Zhen Qin
Yang Yuan
Q. Gu
Andrew Chi-Chih Yao
116
12
0
11 Jan 2025
Key-value memory in the brain
Samuel J. Gershman
Ila Fiete
Kazuki Irie
67
7
0
06 Jan 2025
A Separable Self-attention Inspired by the State Space Model for Computer Vision
Juntao Zhang
Shaogeng Liu
Kun Bian
You Zhou
Pei Zhang
Jianning Liu
Jun Zhou
Bingyan Liu
Mamba
86
0
0
03 Jan 2025
VMamba: Visual State Space Model
Yue Liu
Yunjie Tian
Yuzhong Zhao
Hongtian Yu
Lingxi Xie
Yaowei Wang
Qixiang Ye
Jianbin Jiao
Yunfan Liu
Mamba
208
666
0
31 Dec 2024
SegMAN: Omni-scale Context Modeling with State Space Models and Local Attention for Semantic Segmentation
Yunxiang Fu
Meng Lou
Yizhou Yu
206
1
0
16 Dec 2024
Context Clues: Evaluating Long Context Models for Clinical Prediction Tasks on EHRs
Michael Wornow
Suhana Bedi
Miguel Angel Fuentes Hernandez
E. Steinberg
Jason Alan Fries
Christopher Ré
Sanmi Koyejo
N. Shah
138
5
0
09 Dec 2024
EfficientViM: Efficient Vision Mamba with Hidden State Mixer based State Space Duality
Sanghyeok Lee
Joonmyung Choi
Hyunwoo J. Kim
151
3
0
22 Nov 2024
MambaIRv2: Attentive State Space Restoration
Hang Guo
Yong Guo
Yaohua Zha
Yulun Zhang
Wenbo Li
Tao Dai
Shu-Tao Xia
Yawei Li
Mamba
143
17
0
22 Nov 2024
Unlocking State-Tracking in Linear RNNs Through Negative Eigenvalues
Riccardo Grazzi
Julien N. Siems
Jörg Franke
Arber Zela
Frank Hutter
Massimiliano Pontil
121
16
0
19 Nov 2024
Breaking the Low-Rank Dilemma of Linear Attention
Qihang Fan
Huaibo Huang
Ran He
78
1
0
12 Nov 2024
ETO:Efficient Transformer-based Local Feature Matching by Organizing Multiple Homography Hypotheses
Junjie Ni
Guofeng Zhang
Guanglin Li
Yijin Li
Xinyang Liu
Zhaoyang Huang
Hujun Bao
ViT
87
2
0
30 Oct 2024
A Large Recurrent Action Model: xLSTM enables Fast Inference for Robotics Tasks
Thomas Schmied
Thomas Adler
Vihang Patil
M. Beck
Korbinian Poppel
Johannes Brandstetter
Günter Klambauer
Razvan Pascanu
Sepp Hochreiter
172
5
0
29 Oct 2024
Multimodality Helps Few-shot 3D Point Cloud Semantic Segmentation
Zhaochong An
Guolei Sun
Yun Liu
Runjia Li
Min Wu
Ming-Ming Cheng
Ender Konukoglu
Serge Belongie
85
6
0
29 Oct 2024
Mixture of Parrots: Experts improve memorization more than reasoning
Samy Jelassi
Clara Mohri
David Brandfonbrener
Alex Gu
Nikhil Vyas
Nikhil Anand
David Alvarez-Melis
Yuanzhi Li
Sham Kakade
Eran Malach
MoE
60
4
0
24 Oct 2024
Do Robot Snakes Dream like Electric Sheep? Investigating the Effects of Architectural Inductive Biases on Hallucination
Jerry Huang
Prasanna Parthasarathi
Mehdi Rezagholizadeh
Boxing Chen
Sarath Chandar
103
0
0
22 Oct 2024
Spatial-Mamba: Effective Visual State Space Models via Structure-aware State Fusion
Chaodong Xiao
Minghan Li
Zhengqiang Zhang
Deyu Meng
Lei Zhang
Mamba
113
5
0
19 Oct 2024
1
2
3
Next