Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2307.08621
Cited By
v1
v2
v3
v4 (latest)
Retentive Network: A Successor to Transformer for Large Language Models
17 July 2023
Yutao Sun
Li Dong
Shaohan Huang
Shuming Ma
Yuqing Xia
Jilong Xue
Jianyong Wang
Furu Wei
LRM
Re-assign community
ArXiv (abs)
PDF
HTML
HuggingFace (172 upvotes)
Github
Papers citing
"Retentive Network: A Successor to Transformer for Large Language Models"
50 / 304 papers shown
On Structured State-Space Duality
Jerry Yao-Chieh Hu
Xiwen Zhang
Weimin Wu
Han Liu
Han Liu
159
1
0
24 Dec 2025
Continuous-Time Homeostatic Dynamics for Reentrant Inference Models
Byung Gyu Chae
32
4
0
04 Dec 2025
Cache What Lasts: Token Retention for Memory-Bounded KV Cache in LLMs
N. Bui
Shubham Sharma
Simran Lamba
Saumitra Mishra
Rex Ying
150
4
0
03 Dec 2025
Gated KalmaNet: A Fading Memory Layer Through Test-Time Ridge Regression
Liangzu Peng
Aditya Chattopadhyay
Luca Zancato
Elvis Nunez
Wei Xia
Stefano Soatto
518
3
0
26 Nov 2025
Nemotron-Flash: Towards Latency-Optimal Hybrid Small Language Models
Y. Fu
Xin Dong
Shizhe Diao
Matthijs Van Keirsbilck
Hanrong Ye
...
Maksim Khadkevich
A. Keller
Jan Kautz
Y. Lin
Pavlo Molchanov
204
7
0
24 Nov 2025
Selective Rotary Position Embedding
Sajad Movahedi
Timur Carstensen
Arshia Afzal
Frank Hutter
Antonio Orvieto
Volkan Cevher
378
2
0
21 Nov 2025
Goal-Directed Search Outperforms Goal-Agnostic Memory Compression in Long-Context Memory Tasks
Yicong Zheng
Kevin L. McKee
Thomas Miconi
Zacharie Bugaud
Mick van Gelderen
Jed McCaleb
RALM
112
2
0
20 Nov 2025
CAMS: Towards Compositional Zero-Shot Learning via Gated Cross-Attention and Multi-Space Disentanglement
Pan Yang
Cheng Deng
J. Yang
Han Zhao
Yun-Hai Liu
Yuling Chen
Xiaoli Ruan
Yanping Chen
CoGe
373
0
0
20 Nov 2025
Dynamic Nested Hierarchies: Pioneering Self-Evolution in Machine Learning Architectures for Lifelong Intelligence
Akbar Anbar Jafari
C. Ozcinar
G. Anbarjafari
AI4CE
158
1
0
18 Nov 2025
TNT: Improving Chunkwise Training for Test-Time Memorization
Zeman Li
Ali Behrouz
Yuan Deng
Peilin Zhong
Praneeth Kacham
Mahdi Karami
Meisam Razaviyayn
Vahab Mirrokni
266
2
0
10 Nov 2025
Recursive Dynamics in Fast-Weights Homeostatic Reentry Networks: Toward Reflective Intelligence
B. G. Chae
219
5
0
10 Nov 2025
Attention and Compression is all you need for Controllably Efficient Language Models
Jatin Prakash
N. Jethani
Rajesh Ranganath
MQ
VLM
520
2
0
07 Nov 2025
Agentic World Modeling for 6G: Near-Real-Time Generative State-Space Reasoning
Farhad Rezazadeh
Hatim Chergui
Mérouane Debbah
Houbing Song
Dusit Niyato
Lingjia Liu
205
2
0
04 Nov 2025
Apriel-H1: Towards Efficient Enterprise Reasoning Models
Oleksiy Ostapenko
Luke Kumar
Raymond Li
Denis Kocetkov
J. Lamy-Poirier
...
Sébastien Paquet
Srinivas Sunkara
Valérie Bécaert
Sathwik Tejaswi Madhusudhan
Torsten Scholak
LRM
197
2
0
04 Nov 2025
UniLION: Towards Unified Autonomous Driving Model with Linear Group RNNs
Zhe Liu
Jinghua Hou
Xiaoqing Ye
Jingdong Wang
Hengshuang Zhao
X. Bai
163
2
0
03 Nov 2025
Transformers as Intrinsic Optimizers: Forward Inference through the Energy Principle
Ruifeng Ren
Sheng Ouyang
Huayi Tang
Yong Liu
243
2
0
02 Nov 2025
FlashEVA: Accelerating LLM inference via Efficient Attention
Juan Gabriel Kostelec
Qinghai Guo
204
0
0
01 Nov 2025
Higher-order Linear Attention
Yifan Zhang
Zhen Qin
Quanquan Gu
103
1
0
31 Oct 2025
Nirvana: A Specialized Generalist Model With Task-Aware Memory Mechanism
Yuhua Jiang
Shuang Cheng
Yihao Liu
Ermo Hua
Che Jiang
Weigao Sun
Yu Cheng
Feifei Gao
Biqing Qi
Bowen Zhou
CLL
KELM
MoE
111
0
0
30 Oct 2025
Kimi Linear: An Expressive, Efficient Attention Architecture
Kimi Team
Yu Zhang
Zongyu Lin
Xingcheng Yao
J. Hu
...
Guokun Lai
Yuxin Wu
Xinyu Zhou
Zhilin Yang
Yulun Du
180
41
0
30 Oct 2025
Alias-Free ViT: Fractional Shift Invariance via Linear Attention
H. Michaeli
Daniel Soudry
221
1
0
26 Oct 2025
Energy-Efficient Domain-Specific Artificial Intelligence Models and Agents: Pathways and Paradigms
Abhijit Chatterjee
N. Jha
Jonathan D. Cohen
Thomas Griffiths
Hongjing Lu
Diana Marculescu
Ashiqur Rasul
Keshab K. Parhi
LLMAG
AI4CE
486
2
0
24 Oct 2025
Alleviating Forgetfulness of Linear Attention by Hybrid Sparse Attention and Contextualized Learnable Token Eviction
Mutian He
Philip N. Garner
CLL
301
2
0
23 Oct 2025
From Masks to Worlds: A Hitchhiker's Guide to World Models
Jinbin Bai
Yu Lei
H. Wu
Yuchen Zhu
Shufan Li
Yi Xin
Xiangtai Li
Molei Tao
Aditya Grover
Ming-Hsuan Yang
VGen
SyDa
238
3
0
23 Oct 2025
Stateful KV Cache Management for LLMs: Balancing Space, Time, Accuracy, and Positional Fidelity
Pratik Poudel
KELM
184
0
0
23 Oct 2025
Every Attention Matters: An Efficient Hybrid Architecture for Long-Context Reasoning
Ling Team
Bin Han
Caizhi Tang
Chen Liang
Donghao Zhang
...
Yue Zhang
Yuchen Fang
Zibin Lin
Zixuan Cheng
Jun Zhou
LRM
270
4
0
22 Oct 2025
Understanding and Improving Length Generalization in Hierarchical Sparse Attention Models
Jiaqi Leng
Xiang Hu
Junxiong Wang
Jianguo Li
Wei Wu
Yucheng Lu
188
2
0
20 Oct 2025
To Infinity and Beyond: Tool-Use Unlocks Length Generalization in State Space Models
Eran Malach
Omid Saremi
Sinead Williamson
Arwen Bradley
Aryo Lotfi
Emmanuel Abbe
J. Susskind
Etai Littwin
203
1
0
16 Oct 2025
Chimera: State Space Models Beyond Sequences
Aakash Lahoti
Tanya Marwah
Ratish Puduppully
Albert Gu
Mamba
GNN
AI4CE
296
2
0
14 Oct 2025
HeSRN: Representation Learning On Heterogeneous Graphs via Slot-Aware Retentive Network
Yifan Lu
Ziyun Zou
Belal Alsinglawi
Islam Al-qudah
Izzat Alsmadi
Feilong Tang
Pengfei Jiao
Shoaib Jameel
Imran Razzak
157
0
0
10 Oct 2025
Design Principles for Sequence Models via Coefficient Dynamics
Jerome Sieber
Antonio Orvieto
Melanie Zeilinger
Carmen Amo Alonso
159
0
0
10 Oct 2025
Recurrence-Complete Frame-based Action Models
Michael Keiblinger
169
2
0
08 Oct 2025
Artificial Hippocampus Networks for Efficient Long-Context Modeling
Yunhao Fang
Weihao Yu
Shu Zhong
Qinghao Ye
Xuehan Xiong
Lai Wei
198
5
0
08 Oct 2025
Compressed Convolutional Attention: Efficient Attention in a Compressed Latent Space
Tomás Figliolia
Nicholas Alonso
Rishi Iyer
Quentin Anthony
Beren Millidge
MQ
173
2
0
06 Oct 2025
Sparse Query Attention (SQA): A Computationally Efficient Attention Mechanism with Query Heads Reduction
Adam Filipek
128
2
0
02 Oct 2025
Can Mamba Learn In Context with Outliers? A Theoretical Generalization Analysis
Hongkang Li
Songtao Lu
Xiaodong Cui
Pin-Yu Chen
Meng Wang
MLT
214
1
0
01 Oct 2025
Local Linear Attention: An Optimal Interpolation of Linear and Softmax Attention For Test-Time Regression
Yifei Zuo
Yutong Yin
Zhichen Zeng
Ang Li
Banghua Zhu
Zhaoran Wang
176
1
0
01 Oct 2025
VRWKV-Editor: Reducing quadratic complexity in transformer-based video editing
Abdelilah Aitrouga
Youssef Hmamouche
Amal El Fallah Seghrouchni
VGen
284
0
0
30 Sep 2025
TTT3R: 3D Reconstruction as Test-Time Training
Xingyu Chen
Yue Chen
Yuliang Xiu
Andreas Geiger
Anpei Chen
3DV
391
48
0
30 Sep 2025
Context-Driven Performance Modeling for Causal Inference Operators on Neural Processing Units
Neelesh Gupta
Rakshith Jayanth
Dhruv Parikh
Viktor Prasanna
187
0
0
29 Sep 2025
SLA: Beyond Sparsity in Diffusion Transformers via Fine-Tunable Sparse-Linear Attention
Jintao Zhang
Haoxu Wang
Kai Jiang
Shuo Yang
Kaiwen Zheng
...
Min Zhao
Ion Stoica
Joseph E. Gonzalez
Jun Zhu
Jianfei Chen
223
21
0
28 Sep 2025
ChunkLLM: A Lightweight Pluggable Framework for Accelerating LLMs Inference
Haojie Ouyang
Jianwei Lv
Lei Ren
Chen Wei
Xiaojie Wang
Fangxiang Feng
VLM
222
0
0
28 Sep 2025
StateX: Enhancing RNN Recall via Post-training State Expansion
Xingyu Shen
Yingfa Chen
Zhen Leng Thai
Xu Han
Zhiyuan Liu
Maosong Sun
154
1
0
26 Sep 2025
Enhancing Linear Attention with Residual Learning
Xunhao Lai
Jialiang Kang
Jianqiao Lu
Tong Lin
Pengyu Zhao
KELM
CLL
143
0
0
24 Sep 2025
An overview of neural architectures for self-supervised audio representation learning from masked spectrograms
Sarthak Yadav
Sergios Theodoridis
Zheng-Hua Tan
Mamba
280
1
0
23 Sep 2025
Achilles' Heel of Mamba: Essential difficulties of the Mamba architecture demonstrated by synthetic data
Jiahao Huo
Pengxiao Lin
Zhiwei Wang
Zhi-Qin John Xu
Mamba
251
3
0
22 Sep 2025
Large Language Model Scaling Laws for Neural Quantum States in Quantum Chemistry
Oliver Knitter
Dan Zhao
Stefan Leichenauer
S. Veerapaneni
ELM
LRM
253
0
0
16 Sep 2025
Point-Plane Projections for Accurate LiDAR Semantic Segmentation in Small Data Scenarios
Simone Mosco
Daniel Fusaro
Wanmeng Li
Emanuele Menegatti
Alberto Pretto
3DPC
118
1
0
13 Sep 2025
Elucidating the Design Space of Decay in Linear Attention
Zhen Qin
Xuyang Shen
Yiran Zhong
145
2
0
05 Sep 2025
AudioRWKV: Efficient and Stable Bidirectional RWKV for Audio Pattern Recognition
Jiayu Xiong
Jun Xue
Jianlong Kwan
Jing Wang
143
0
0
02 Sep 2025
1
2
3
4
5
6
7
Next
Page 1 of 7