Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2202.10447
Cited By
v1
v2 (latest)
Transformer Quality in Linear Time
International Conference on Machine Learning (ICML), 2022
21 February 2022
Weizhe Hua
Zihang Dai
Hanxiao Liu
Quoc V. Le
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Transformer Quality in Linear Time"
50 / 129 papers shown
TNT: Improving Chunkwise Training for Test-Time Memorization
Zeman Li
Ali Behrouz
Yuan Deng
Peilin Zhong
Praneeth Kacham
Mahdi Karami
Meisam Razaviyayn
Vahab Mirrokni
230
2
0
10 Nov 2025
GroupKAN: Rethinking Nonlinearity with Grouped Spline-based KAN Modeling for Efficient Medical Image Segmentation
Guojie Li
Anwar P.P. Abdul Majeed
Muhammad Ateeq
Anh Nguyen
Fan Zhang
MedIm
134
0
0
07 Nov 2025
FlashEVA: Accelerating LLM inference via Efficient Attention
Juan Gabriel Kostelec
Qinghai Guo
164
0
0
01 Nov 2025
Kimi Linear: An Expressive, Efficient Attention Architecture
Kimi Team
Yu Zhang
Zongyu Lin
Xingcheng Yao
J. Hu
...
Guokun Lai
Yuxin Wu
Xinyu Zhou
Zhilin Yang
Yulun Du
143
11
0
30 Oct 2025
Attentive Convolution: Unifying the Expressivity of Self-Attention with Convolutional Efficiency
Hao Yu
H. G. Chen
Yan Jiang
Wei Peng
Zhaodong Sun
Samuel Kaski
Guoying Zhao
146
0
0
23 Oct 2025
Artificial Hippocampus Networks for Efficient Long-Context Modeling
Yunhao Fang
Weihao Yu
Shu Zhong
Qinghao Ye
Xuehan Xiong
Lai Wei
146
2
0
08 Oct 2025
Hybrid Architectures for Language Models: Systematic Analysis and Design Insights
Sangmin Bae
Bilge Acun
Haroun Habeeb
S. Kim
Chien-Yu Lin
Liang Luo
Junjie Wang
Carole-Jean Wu
158
4
0
06 Oct 2025
StateX: Enhancing RNN Recall via Post-training State Expansion
Xingyu Shen
Yingfa Chen
Zhen Leng Thai
Xu Han
Zhiyuan Liu
Maosong Sun
98
0
0
26 Sep 2025
FLASepformer: Efficient Speech Separation with Gated Focused Linear Attention Transformer
Haoxu Wang
Yiheng Jiang
Gang Qiao
Pengteng Shi
Biao Tian
85
1
0
27 Aug 2025
Jet-Nemotron: Efficient Language Model with Post Neural Architecture Search
Yuxian Gu
Qinghao Hu
Shang Yang
Haocheng Xi
Junyu Chen
Song Han
Han Cai
255
14
0
21 Aug 2025
Fast weight programming and linear transformers: from machine learning to neurobiology
Kazuki Irie
Samuel J. Gershman
173
0
0
11 Aug 2025
Efficient Attention Mechanisms for Large Language Models: A Survey
Yutao Sun
Zhenyu Li
Yike Zhang
Tengyu Pan
Bowen Dong
Yuyi Guo
Jianyong Wang
246
10
0
25 Jul 2025
RAT: Bridging RNN Efficiency and Attention Accuracy via Chunk-based Sequence Modeling
IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2025
Xiuying Wei
Anunay Yadav
Razvan Pascanu
Çağlar Gülçehre
AI4TS
262
0
0
06 Jul 2025
VSRM: A Robust Mamba-Based Framework for Video Super-Resolution
Dinh Phu Tran
Dao Duy Hung
Daeyoung Kim
204
1
0
28 Jun 2025
MesaNet: Sequence Modeling by Locally Optimal Test-Time Training
J. Oswald
Nino Scherrer
Seijin Kobayashi
Luca Versari
Songlin Yang
...
Guillaume Lajoie
Charlotte Frenkel
Razvan Pascanu
Blaise Agüera y Arcas
João Sacramento
312
15
0
05 Jun 2025
Log-Linear Attention
Han Guo
Songlin Yang
Tarushii Goel
Eric P. Xing
Tri Dao
Yoon Kim
Mamba
423
14
0
05 Jun 2025
Blending Complementary Memory Systems in Hybrid Quadratic-Linear Transformers
Kazuki Irie
Morris Yau
Samuel J. Gershman
221
6
0
31 May 2025
ATLAS: Learning to Optimally Memorize the Context at Test Time
Ali Behrouz
Zeman Li
Praneeth Kacham
Majid Daliri
Yuan Deng
Peilin Zhong
Meisam Razaviyayn
Vahab Mirrokni
534
25
0
29 May 2025
S2AFormer: Strip Self-Attention for Efficient Vision Transformer
IEEE Transactions on Image Processing (IEEE TIP), 2025
Guoan Xu
Wenfeng Huang
Wenjing Jia
Jiamao Li
Guangwei Gao
Guo-Jun Qi
277
0
0
28 May 2025
Gated Attention for Large Language Models: Non-linearity, Sparsity, and Attention-Sink-Free
Zihan Qiu
Zhaoxiang Wang
Bo Zheng
Zeyu Huang
Kaiyue Wen
...
Fei Huang
Suozhi Huang
Dayiheng Liu
Jingren Zhou
Junyang Lin
MoE
905
40
0
10 May 2025
PRE-Mamba: A 4D State Space Model for Ultra-High-Frequent Event Camera Deraining
Ciyu Ruan
Ruishan Guo
Zihang Gong
Jinfeng Xu
Wenhan Yang
Xinlei Chen
Mamba
374
8
0
08 May 2025
Hadamard product in deep learning: Introduction, Advances and Challenges
IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2025
Grigorios G. Chrysos
Yongtao Wu
Razvan Pascanu
Philip Torr
Volkan Cevher
AAML
349
15
0
17 Apr 2025
SAFT: Structure-aware Transformers for Textual Interaction Classification
Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR), 2025
Hongtao Wang
Renchi Yang
Hewen Wang
Haoran Zheng
Jianliang Xu
103
0
0
07 Apr 2025
FLAMES: A Hybrid Spiking-State Space Model for Adaptive Memory Retention in Event-Based Learning
Biswadeep Chakraborty
Saibal Mukhopadhyay
463
0
0
02 Apr 2025
Reducing Smoothness with Expressive Memory Enhanced Hierarchical Graph Neural Networks
Thomas Bailie
Yun Sing Koh
S. Karthik Mukkavilli
V. Vetrova
AI4TS
528
0
0
01 Apr 2025
ParallelFlow: Parallelizing Linear Transformers via Flow Discretization
Nicola Muca Cirone
C. Salvi
340
6
0
01 Apr 2025
Tiled Flash Linear Attention: More Efficient Linear RNN and xLSTM Kernels
M. Beck
Korbinian Poppel
Phillip Lippe
Sepp Hochreiter
451
8
0
18 Mar 2025
xLSTM 7B: A Recurrent LLM for Fast and Efficient Inference
M. Beck
Korbinian Poppel
Phillip Lippe
Richard Kurle
P. Blies
Günter Klambauer
Sebastian Böck
Sepp Hochreiter
LRM
278
11
0
17 Mar 2025
Parallel Sequence Modeling via Generalized Spatial Propagation Network
Computer Vision and Pattern Recognition (CVPR), 2025
Hongjun Wang
Wonmin Byeon
Jiarui Xu
Liang Feng
Ka Chun Cheung
Xiaolong Wang
Kai Han
Jan Kautz
Sifei Liu
837
3
0
21 Jan 2025
Epicardium Prompt-guided Real-time Cardiac Ultrasound Frame-to-volume Registration
International Conference on Medical Image Computing and Computer-Assisted Intervention (MICCAI), 2024
Long Lei
Jun Zhou
Jialun Pei
Baoliang Zhao
Yueming Jin
Yuen-Chun Jeremy Teoh
Jing Qin
Pheng-Ann Heng
550
5
0
20 Jan 2025
MetaLA: Unified Optimal Linear Approximation to Softmax Attention Map
Neural Information Processing Systems (NeurIPS), 2024
Yuhong Chou
Man Yao
Kexin Wang
Yuqi Pan
Ruijie Zhu
Yiran Zhong
Yu Qiao
Jian Wu
Bo Xu
Guoqi Li
299
14
0
16 Nov 2024
Scene Graph Generation with Role-Playing Large Language Models
Neural Information Processing Systems (NeurIPS), 2024
Guikun Chen
Jin Li
Wenguan Wang
VLM
276
21
0
20 Oct 2024
HSR-Enhanced Sparse Attention Acceleration
Bo Chen
Yingyu Liang
Zhizhou Sha
Zhenmei Shi
Zhao Song
818
23
0
14 Oct 2024
Towards Universality: Studying Mechanistic Similarity Across Language Model Architectures
International Conference on Learning Representations (ICLR), 2024
Junxuan Wang
Xuyang Ge
Wentao Shu
Qiong Tang
Yunhua Zhou
Zhengfu He
Xipeng Qiu
247
17
0
09 Oct 2024
Rodimus*: Breaking the Accuracy-Efficiency Trade-Off with Efficient Attentions
International Conference on Learning Representations (ICLR), 2024
Zhihao He
Hang Yu
Zi Gong
Shizhan Liu
Jia-Nan Li
Weiyao Lin
VLM
403
5
0
09 Oct 2024
Gated Slot Attention for Efficient Linear-Time Sequence Modeling
Neural Information Processing Systems (NeurIPS), 2024
Yu Zhang
Aaron Courville
Ruijie Zhu
Yue Zhang
Leyang Cui
...
Freda Shi
Bailin Wang
Wei Bi
P. Zhou
Guohong Fu
298
49
0
11 Sep 2024
1.5-Pints Technical Report: Pretraining in Days, Not Months -- Your Language Model Thrives on Quality Data
Calvin Tan
Jerome Wang
ALM
286
5
0
07 Aug 2024
Sparser is Faster and Less is More: Efficient Sparse Attention for Long-Range Transformers
Chao Lou
Zixia Jia
Zilong Zheng
Kewei Tu
ODL
233
51
0
24 Jun 2024
tcrLM: a lightweight protein language model for predicting T cell receptor and epitope binding specificity
Xing Fang
Chenpeng Yu
Shiye Tian
Hui Liu
97
0
0
24 Jun 2024
Short-Long Convolutions Help Hardware-Efficient Linear Attention to Focus on Long Sequences
Zicheng Liu
Siyuan Li
Li Wang
Zedong Wang
Yunfan Liu
Stan Z. Li
265
10
0
12 Jun 2024
When Linear Attention Meets Autoregressive Decoding: Towards More Effective and Efficient Linearized Large Language Models
Haoran You
Yichao Fu
Zheng Wang
Amir Yazdanbakhsh
Yingyan Celine Lin
364
7
0
11 Jun 2024
Stock Movement Prediction with Multimodal Stable Fusion via Gated Cross-Attention Mechanism
Chang Zong
Jian Shao
Weiming Lu
Yueting Zhuang
231
9
0
06 Jun 2024
D-FaST: Cognitive Signal Decoding with Disentangled Frequency-Spatial-Temporal Attention
WeiGuo Chen
Changjian Wang
Kele Xu
Yuan Yuan
Yanru Bai
Dongsong Zhang
157
5
0
02 Jun 2024
Various Lengths, Constant Speed: Efficient Language Modeling with Lightning Attention
Zhen Qin
Weigao Sun
Dong Li
Xuyang Shen
Weixuan Sun
Yiran Zhong
281
22
0
27 May 2024
Demystify Mamba in Vision: A Linear Attention Perspective
Dongchen Han
Ziyi Wang
Zhuofan Xia
Yizeng Han
Yifan Pu
Chunjiang Ge
Jun Song
Shiji Song
Bo Zheng
Gao Huang
Mamba
363
161
0
26 May 2024
RNG: Reducing Multi-level Noise and Multi-grained Semantic Gap for Joint Multimodal Aspect-Sentiment Analysis
Yaxin Liu
Yan Zhou
Ziming Li
Jinchuan Zhang
Yu Shang
Chenyang Zhang
Songlin Hu
165
10
0
20 May 2024
Improving Transformers with Dynamically Composable Multi-Head Attention
International Conference on Machine Learning (ICML), 2024
Da Xiao
Qingye Meng
Shengping Li
Xingyuan Yuan
260
5
0
14 May 2024
BrainODE: Dynamic Brain Signal Analysis via Graph-Aided Neural Ordinary Differential Equations
Kaiqiao Han
Yi Yang
Zijie Huang
Xuan Kan
Yang Yang
...
Lifang He
Chen Tang
Luke Huan
Wei Wang
Carl Yang
303
5
0
30 Apr 2024
Mamba-360: Survey of State Space Models as Transformer Alternative for Long Sequence Modelling: Methods, Applications, and Challenges
Badri N. Patro
Vijay Srinivas Agneeswaran
Mamba
362
76
0
24 Apr 2024
A Survey on Efficient Inference for Large Language Models
Zixuan Zhou
Xuefei Ning
Ke Hong
Tianyu Fu
Jiaming Xu
...
Shengen Yan
Guohao Dai
Xiao-Ping Zhang
Yuhan Dong
Yu Wang
421
174
0
22 Apr 2024
1
2
3
Next
Page 1 of 3