ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2202.10447
  4. Cited By
Transformer Quality in Linear Time
v1v2 (latest)

Transformer Quality in Linear Time

International Conference on Machine Learning (ICML), 2022
21 February 2022
Weizhe Hua
Zihang Dai
Hanxiao Liu
Quoc V. Le
ArXiv (abs)PDFHTML

Papers citing "Transformer Quality in Linear Time"

50 / 129 papers shown
Title
TNT: Improving Chunkwise Training for Test-Time Memorization
TNT: Improving Chunkwise Training for Test-Time Memorization
Zeman Li
Ali Behrouz
Yuan Deng
Peilin Zhong
Praneeth Kacham
Mahdi Karami
Meisam Razaviyayn
Vahab Mirrokni
152
0
0
10 Nov 2025
GroupKAN: Rethinking Nonlinearity with Grouped Spline-based KAN Modeling for Efficient Medical Image Segmentation
GroupKAN: Rethinking Nonlinearity with Grouped Spline-based KAN Modeling for Efficient Medical Image Segmentation
Guojie Li
Anwar P.P. Abdul Majeed
Muhammad Ateeq
Anh Nguyen
Fan Zhang
MedIm
36
0
0
07 Nov 2025
FlashEVA: Accelerating LLM inference via Efficient Attention
FlashEVA: Accelerating LLM inference via Efficient Attention
Juan Gabriel Kostelec
Qinghai Guo
69
0
0
01 Nov 2025
Kimi Linear: An Expressive, Efficient Attention Architecture
Kimi Linear: An Expressive, Efficient Attention Architecture
Kimi Team
Yu Zhang
Zongyu Lin
Xingcheng Yao
J. Hu
...
Guokun Lai
Yuxin Wu
Xinyu Zhou
Zhilin Yang
Yulun Du
88
2
0
30 Oct 2025
Attentive Convolution: Unifying the Expressivity of Self-Attention with Convolutional Efficiency
Attentive Convolution: Unifying the Expressivity of Self-Attention with Convolutional Efficiency
Hao Yu
H. G. Chen
Yan Jiang
Wei Peng
Zhaodong Sun
Samuel Kaski
Guoying Zhao
73
0
0
23 Oct 2025
Artificial Hippocampus Networks for Efficient Long-Context Modeling
Artificial Hippocampus Networks for Efficient Long-Context Modeling
Yunhao Fang
Weihao Yu
Shu Zhong
Qinghao Ye
Xuehan Xiong
Lai Wei
68
1
0
08 Oct 2025
Hybrid Architectures for Language Models: Systematic Analysis and Design Insights
Hybrid Architectures for Language Models: Systematic Analysis and Design Insights
Sangmin Bae
Bilge Acun
Haroun Habeeb
S. Kim
Chien-Yu Lin
Liang Luo
Junjie Wang
Carole-Jean Wu
124
4
0
06 Oct 2025
StateX: Enhancing RNN Recall via Post-training State Expansion
StateX: Enhancing RNN Recall via Post-training State Expansion
Xingyu Shen
Yingfa Chen
Zhen Leng Thai
Xu Han
Zhiyuan Liu
Maosong Sun
84
0
0
26 Sep 2025
FLASepformer: Efficient Speech Separation with Gated Focused Linear Attention Transformer
FLASepformer: Efficient Speech Separation with Gated Focused Linear Attention Transformer
Haoxu Wang
Yiheng Jiang
Gang Qiao
Pengteng Shi
Biao Tian
60
1
0
27 Aug 2025
Jet-Nemotron: Efficient Language Model with Post Neural Architecture Search
Jet-Nemotron: Efficient Language Model with Post Neural Architecture Search
Yuxian Gu
Qinghao Hu
Shang Yang
Haocheng Xi
Junyu Chen
Song Han
Han Cai
144
10
0
21 Aug 2025
Fast weight programming and linear transformers: from machine learning to neurobiology
Fast weight programming and linear transformers: from machine learning to neurobiology
Kazuki Irie
Samuel J. Gershman
104
0
0
11 Aug 2025
Efficient Attention Mechanisms for Large Language Models: A Survey
Efficient Attention Mechanisms for Large Language Models: A Survey
Yutao Sun
Zhenyu Li
Yike Zhang
Tengyu Pan
Bowen Dong
Yuyi Guo
Jianyong Wang
182
7
0
25 Jul 2025
RAT: Bridging RNN Efficiency and Attention Accuracy via Chunk-based Sequence Modeling
RAT: Bridging RNN Efficiency and Attention Accuracy via Chunk-based Sequence ModelingIEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2025
Xiuying Wei
Anunay Yadav
Razvan Pascanu
Çağlar Gülçehre
AI4TS
127
0
0
06 Jul 2025
VSRM: A Robust Mamba-Based Framework for Video Super-Resolution
VSRM: A Robust Mamba-Based Framework for Video Super-Resolution
Dinh Phu Tran
Dao Duy Hung
Daeyoung Kim
114
1
0
28 Jun 2025
MesaNet: Sequence Modeling by Locally Optimal Test-Time Training
J. Oswald
Nino Scherrer
Seijin Kobayashi
Luca Versari
Songlin Yang
...
Guillaume Lajoie
Charlotte Frenkel
Razvan Pascanu
Blaise Agüera y Arcas
João Sacramento
225
12
0
05 Jun 2025
Log-Linear Attention
Log-Linear Attention
Han Guo
Songlin Yang
Tarushii Goel
Eric P. Xing
Tri Dao
Yoon Kim
Mamba
365
12
0
05 Jun 2025
Blending Complementary Memory Systems in Hybrid Quadratic-Linear Transformers
Blending Complementary Memory Systems in Hybrid Quadratic-Linear Transformers
Kazuki Irie
Morris Yau
Samuel J. Gershman
127
3
0
31 May 2025
ATLAS: Learning to Optimally Memorize the Context at Test Time
ATLAS: Learning to Optimally Memorize the Context at Test Time
Ali Behrouz
Zeman Li
Praneeth Kacham
Majid Daliri
Yuan Deng
Peilin Zhong
Meisam Razaviyayn
Vahab Mirrokni
340
22
0
29 May 2025
S2AFormer: Strip Self-Attention for Efficient Vision Transformer
S2AFormer: Strip Self-Attention for Efficient Vision Transformer
Guoan Xu
Wenfeng Huang
Wenjing Jia
Jiamao Li
Guangwei Gao
Guo-Jun Qi
183
0
0
28 May 2025
Gated Attention for Large Language Models: Non-linearity, Sparsity, and Attention-Sink-Free
Gated Attention for Large Language Models: Non-linearity, Sparsity, and Attention-Sink-Free
Zihan Qiu
Zhaoxiang Wang
Bo Zheng
Zeyu Huang
Kaiyue Wen
...
Fei Huang
Suozhi Huang
Dayiheng Liu
Jingren Zhou
Junyang Lin
MoE
608
18
0
10 May 2025
PRE-Mamba: A 4D State Space Model for Ultra-High-Frequent Event Camera Deraining
PRE-Mamba: A 4D State Space Model for Ultra-High-Frequent Event Camera Deraining
Ciyu Ruan
Ruishan Guo
Zihang Gong
Jinfeng Xu
Wenhan Yang
Xinlei Chen
Mamba
270
6
0
08 May 2025
Hadamard product in deep learning: Introduction, Advances and Challenges
Hadamard product in deep learning: Introduction, Advances and ChallengesIEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2025
Grigorios G. Chrysos
Yongtao Wu
Razvan Pascanu
Philip Torr
Volkan Cevher
AAML
293
12
0
17 Apr 2025
SAFT: Structure-aware Transformers for Textual Interaction Classification
SAFT: Structure-aware Transformers for Textual Interaction ClassificationAnnual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR), 2025
Hongtao Wang
Renchi Yang
Hewen Wang
Haoran Zheng
Jianliang Xu
73
0
0
07 Apr 2025
FLAMES: A Hybrid Spiking-State Space Model for Adaptive Memory Retention in Event-Based Learning
FLAMES: A Hybrid Spiking-State Space Model for Adaptive Memory Retention in Event-Based Learning
Biswadeep Chakraborty
Saibal Mukhopadhyay
340
0
0
02 Apr 2025
Reducing Smoothness with Expressive Memory Enhanced Hierarchical Graph Neural Networks
Reducing Smoothness with Expressive Memory Enhanced Hierarchical Graph Neural Networks
Thomas Bailie
Yun Sing Koh
S. Karthik Mukkavilli
V. Vetrova
AI4TS
417
0
0
01 Apr 2025
ParallelFlow: Parallelizing Linear Transformers via Flow Discretization
ParallelFlow: Parallelizing Linear Transformers via Flow Discretization
Nicola Muca Cirone
C. Salvi
293
6
0
01 Apr 2025
Tiled Flash Linear Attention: More Efficient Linear RNN and xLSTM Kernels
Tiled Flash Linear Attention: More Efficient Linear RNN and xLSTM Kernels
M. Beck
Korbinian Poppel
Phillip Lippe
Sepp Hochreiter
394
7
0
18 Mar 2025
xLSTM 7B: A Recurrent LLM for Fast and Efficient Inference
xLSTM 7B: A Recurrent LLM for Fast and Efficient Inference
M. Beck
Korbinian Poppel
Phillip Lippe
Richard Kurle
P. Blies
Günter Klambauer
Sebastian Böck
Sepp Hochreiter
LRM
205
9
0
17 Mar 2025
Parallel Sequence Modeling via Generalized Spatial Propagation Network
Parallel Sequence Modeling via Generalized Spatial Propagation NetworkComputer Vision and Pattern Recognition (CVPR), 2025
Hongjun Wang
Wonmin Byeon
Jiarui Xu
Liang Feng
Ka Chun Cheung
Xiaolong Wang
Kai Han
Jan Kautz
Sifei Liu
738
3
0
21 Jan 2025
Epicardium Prompt-guided Real-time Cardiac Ultrasound Frame-to-volume Registration
Epicardium Prompt-guided Real-time Cardiac Ultrasound Frame-to-volume RegistrationInternational Conference on Medical Image Computing and Computer-Assisted Intervention (MICCAI), 2024
Long Lei
Jun Zhou
Jialun Pei
Baoliang Zhao
Yueming Jin
Yuen-Chun Jeremy Teoh
Jing Qin
Pheng-Ann Heng
459
3
0
20 Jan 2025
MetaLA: Unified Optimal Linear Approximation to Softmax Attention MapNeural Information Processing Systems (NeurIPS), 2024
Yuhong Chou
Man Yao
Kexin Wang
Yuqi Pan
Ruijie Zhu
Yiran Zhong
Yu Qiao
Jian Wu
Bo Xu
Guoqi Li
229
14
0
16 Nov 2024
Scene Graph Generation with Role-Playing Large Language Models
Scene Graph Generation with Role-Playing Large Language ModelsNeural Information Processing Systems (NeurIPS), 2024
Guikun Chen
Jin Li
Wenguan Wang
VLM
199
19
0
20 Oct 2024
HSR-Enhanced Sparse Attention Acceleration
HSR-Enhanced Sparse Attention Acceleration
Bo Chen
Yingyu Liang
Zhizhou Sha
Zhenmei Shi
Zhao Song
530
24
0
14 Oct 2024
Towards Universality: Studying Mechanistic Similarity Across Language
  Model Architectures
Towards Universality: Studying Mechanistic Similarity Across Language Model ArchitecturesInternational Conference on Learning Representations (ICLR), 2024
Junxuan Wang
Xuyang Ge
Wentao Shu
Qiong Tang
Yunhua Zhou
Zhengfu He
Xipeng Qiu
196
14
0
09 Oct 2024
Rodimus*: Breaking the Accuracy-Efficiency Trade-Off with Efficient Attentions
Rodimus*: Breaking the Accuracy-Efficiency Trade-Off with Efficient AttentionsInternational Conference on Learning Representations (ICLR), 2024
Zhihao He
Hang Yu
Zi Gong
Shizhan Liu
Jia-Nan Li
Weiyao Lin
VLM
321
4
0
09 Oct 2024
Gated Slot Attention for Efficient Linear-Time Sequence Modeling
Gated Slot Attention for Efficient Linear-Time Sequence ModelingNeural Information Processing Systems (NeurIPS), 2024
Yu Zhang
Aaron Courville
Ruijie Zhu
Yue Zhang
Leyang Cui
...
Freda Shi
Bailin Wang
Wei Bi
P. Zhou
Guohong Fu
229
45
0
11 Sep 2024
1.5-Pints Technical Report: Pretraining in Days, Not Months -- Your
  Language Model Thrives on Quality Data
1.5-Pints Technical Report: Pretraining in Days, Not Months -- Your Language Model Thrives on Quality Data
Calvin Tan
Jerome Wang
ALM
253
5
0
07 Aug 2024
Sparser is Faster and Less is More: Efficient Sparse Attention for
  Long-Range Transformers
Sparser is Faster and Less is More: Efficient Sparse Attention for Long-Range Transformers
Chao Lou
Zixia Jia
Zilong Zheng
Kewei Tu
ODL
158
45
0
24 Jun 2024
tcrLM: a lightweight protein language model for predicting T cell receptor and epitope binding specificity
tcrLM: a lightweight protein language model for predicting T cell receptor and epitope binding specificity
Xing Fang
Chenpeng Yu
Shiye Tian
Hui Liu
78
0
0
24 Jun 2024
Short-Long Convolutions Help Hardware-Efficient Linear Attention to
  Focus on Long Sequences
Short-Long Convolutions Help Hardware-Efficient Linear Attention to Focus on Long Sequences
Zicheng Liu
Siyuan Li
Li Wang
Zedong Wang
Yunfan Liu
Stan Z. Li
189
10
0
12 Jun 2024
When Linear Attention Meets Autoregressive Decoding: Towards More
  Effective and Efficient Linearized Large Language Models
When Linear Attention Meets Autoregressive Decoding: Towards More Effective and Efficient Linearized Large Language Models
Haoran You
Yichao Fu
Zheng Wang
Amir Yazdanbakhsh
Yingyan Celine Lin
269
6
0
11 Jun 2024
Stock Movement Prediction with Multimodal Stable Fusion via Gated
  Cross-Attention Mechanism
Stock Movement Prediction with Multimodal Stable Fusion via Gated Cross-Attention Mechanism
Chang Zong
Jian Shao
Weiming Lu
Yueting Zhuang
171
8
0
06 Jun 2024
D-FaST: Cognitive Signal Decoding with Disentangled
  Frequency-Spatial-Temporal Attention
D-FaST: Cognitive Signal Decoding with Disentangled Frequency-Spatial-Temporal Attention
WeiGuo Chen
Changjian Wang
Kele Xu
Yuan Yuan
Yanru Bai
Dongsong Zhang
133
4
0
02 Jun 2024
Various Lengths, Constant Speed: Efficient Language Modeling with
  Lightning Attention
Various Lengths, Constant Speed: Efficient Language Modeling with Lightning Attention
Zhen Qin
Weigao Sun
Dong Li
Xuyang Shen
Weixuan Sun
Yiran Zhong
211
22
0
27 May 2024
Demystify Mamba in Vision: A Linear Attention Perspective
Demystify Mamba in Vision: A Linear Attention Perspective
Dongchen Han
Ziyi Wang
Zhuofan Xia
Yizeng Han
Yifan Pu
Chunjiang Ge
Jun Song
Shiji Song
Bo Zheng
Gao Huang
Mamba
307
142
0
26 May 2024
RNG: Reducing Multi-level Noise and Multi-grained Semantic Gap for Joint
  Multimodal Aspect-Sentiment Analysis
RNG: Reducing Multi-level Noise and Multi-grained Semantic Gap for Joint Multimodal Aspect-Sentiment Analysis
Yaxin Liu
Yan Zhou
Ziming Li
Jinchuan Zhang
Yu Shang
Chenyang Zhang
Songlin Hu
131
10
0
20 May 2024
Improving Transformers with Dynamically Composable Multi-Head Attention
Improving Transformers with Dynamically Composable Multi-Head AttentionInternational Conference on Machine Learning (ICML), 2024
Da Xiao
Qingye Meng
Shengping Li
Xingyuan Yuan
135
5
0
14 May 2024
BrainODE: Dynamic Brain Signal Analysis via Graph-Aided Neural Ordinary
  Differential Equations
BrainODE: Dynamic Brain Signal Analysis via Graph-Aided Neural Ordinary Differential Equations
Kaiqiao Han
Yi Yang
Zijie Huang
Xuan Kan
Yang Yang
...
Lifang He
Chen Tang
Luke Huan
Wei Wang
Carl Yang
187
4
0
30 Apr 2024
Mamba-360: Survey of State Space Models as Transformer Alternative for
  Long Sequence Modelling: Methods, Applications, and Challenges
Mamba-360: Survey of State Space Models as Transformer Alternative for Long Sequence Modelling: Methods, Applications, and Challenges
Badri N. Patro
Vijay Srinivas Agneeswaran
Mamba
272
70
0
24 Apr 2024
A Survey on Efficient Inference for Large Language Models
A Survey on Efficient Inference for Large Language Models
Zixuan Zhou
Xuefei Ning
Ke Hong
Tianyu Fu
Jiaming Xu
...
Shengen Yan
Guohao Dai
Xiao-Ping Zhang
Yuhan Dong
Yu Wang
308
163
0
22 Apr 2024
123
Next