ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2307.08621
  4. Cited By
Retentive Network: A Successor to Transformer for Large Language Models
v1v2v3v4 (latest)

Retentive Network: A Successor to Transformer for Large Language Models

17 July 2023
Yutao Sun
Li Dong
Shaohan Huang
Shuming Ma
Yuqing Xia
Jilong Xue
Jianyong Wang
Furu Wei
    LRM
ArXiv (abs)PDFHTMLHuggingFace (172 upvotes)Github

Papers citing "Retentive Network: A Successor to Transformer for Large Language Models"

50 / 304 papers shown
On Structured State-Space Duality
On Structured State-Space Duality
Jerry Yao-Chieh Hu
Xiwen Zhang
Weimin Wu
Han Liu
Han Liu
159
1
0
24 Dec 2025
Continuous-Time Homeostatic Dynamics for Reentrant Inference Models
Continuous-Time Homeostatic Dynamics for Reentrant Inference Models
Byung Gyu Chae
32
4
0
04 Dec 2025
Cache What Lasts: Token Retention for Memory-Bounded KV Cache in LLMs
Cache What Lasts: Token Retention for Memory-Bounded KV Cache in LLMs
N. Bui
Shubham Sharma
Simran Lamba
Saumitra Mishra
Rex Ying
150
4
0
03 Dec 2025
Gated KalmaNet: A Fading Memory Layer Through Test-Time Ridge Regression
Gated KalmaNet: A Fading Memory Layer Through Test-Time Ridge Regression
Liangzu Peng
Aditya Chattopadhyay
Luca Zancato
Elvis Nunez
Wei Xia
Stefano Soatto
518
3
0
26 Nov 2025
Nemotron-Flash: Towards Latency-Optimal Hybrid Small Language Models
Nemotron-Flash: Towards Latency-Optimal Hybrid Small Language Models
Y. Fu
Xin Dong
Shizhe Diao
Matthijs Van Keirsbilck
Hanrong Ye
...
Maksim Khadkevich
A. Keller
Jan Kautz
Y. Lin
Pavlo Molchanov
204
7
0
24 Nov 2025
Selective Rotary Position Embedding
Selective Rotary Position Embedding
Sajad Movahedi
Timur Carstensen
Arshia Afzal
Frank Hutter
Antonio Orvieto
Volkan Cevher
378
2
0
21 Nov 2025
Goal-Directed Search Outperforms Goal-Agnostic Memory Compression in Long-Context Memory Tasks
Goal-Directed Search Outperforms Goal-Agnostic Memory Compression in Long-Context Memory Tasks
Yicong Zheng
Kevin L. McKee
Thomas Miconi
Zacharie Bugaud
Mick van Gelderen
Jed McCaleb
RALM
112
2
0
20 Nov 2025
CAMS: Towards Compositional Zero-Shot Learning via Gated Cross-Attention and Multi-Space Disentanglement
CAMS: Towards Compositional Zero-Shot Learning via Gated Cross-Attention and Multi-Space Disentanglement
Pan Yang
Cheng Deng
J. Yang
Han Zhao
Yun-Hai Liu
Yuling Chen
Xiaoli Ruan
Yanping Chen
CoGe
373
0
0
20 Nov 2025
Dynamic Nested Hierarchies: Pioneering Self-Evolution in Machine Learning Architectures for Lifelong Intelligence
Dynamic Nested Hierarchies: Pioneering Self-Evolution in Machine Learning Architectures for Lifelong Intelligence
Akbar Anbar Jafari
C. Ozcinar
G. Anbarjafari
AI4CE
158
1
0
18 Nov 2025
TNT: Improving Chunkwise Training for Test-Time Memorization
TNT: Improving Chunkwise Training for Test-Time Memorization
Zeman Li
Ali Behrouz
Yuan Deng
Peilin Zhong
Praneeth Kacham
Mahdi Karami
Meisam Razaviyayn
Vahab Mirrokni
266
2
0
10 Nov 2025
Recursive Dynamics in Fast-Weights Homeostatic Reentry Networks: Toward Reflective Intelligence
Recursive Dynamics in Fast-Weights Homeostatic Reentry Networks: Toward Reflective Intelligence
B. G. Chae
219
5
0
10 Nov 2025
Attention and Compression is all you need for Controllably Efficient Language Models
Attention and Compression is all you need for Controllably Efficient Language Models
Jatin Prakash
N. Jethani
Rajesh Ranganath
MQVLM
520
2
0
07 Nov 2025
Agentic World Modeling for 6G: Near-Real-Time Generative State-Space Reasoning
Agentic World Modeling for 6G: Near-Real-Time Generative State-Space Reasoning
Farhad Rezazadeh
Hatim Chergui
Mérouane Debbah
Houbing Song
Dusit Niyato
Lingjia Liu
205
2
0
04 Nov 2025
Apriel-H1: Towards Efficient Enterprise Reasoning Models
Apriel-H1: Towards Efficient Enterprise Reasoning Models
Oleksiy Ostapenko
Luke Kumar
Raymond Li
Denis Kocetkov
J. Lamy-Poirier
...
Sébastien Paquet
Srinivas Sunkara
Valérie Bécaert
Sathwik Tejaswi Madhusudhan
Torsten Scholak
LRM
197
2
0
04 Nov 2025
UniLION: Towards Unified Autonomous Driving Model with Linear Group RNNs
UniLION: Towards Unified Autonomous Driving Model with Linear Group RNNs
Zhe Liu
Jinghua Hou
Xiaoqing Ye
Jingdong Wang
Hengshuang Zhao
X. Bai
163
2
0
03 Nov 2025
Transformers as Intrinsic Optimizers: Forward Inference through the Energy Principle
Transformers as Intrinsic Optimizers: Forward Inference through the Energy Principle
Ruifeng Ren
Sheng Ouyang
Huayi Tang
Yong Liu
243
2
0
02 Nov 2025
FlashEVA: Accelerating LLM inference via Efficient Attention
FlashEVA: Accelerating LLM inference via Efficient Attention
Juan Gabriel Kostelec
Qinghai Guo
204
0
0
01 Nov 2025
Higher-order Linear Attention
Higher-order Linear Attention
Yifan Zhang
Zhen Qin
Quanquan Gu
103
1
0
31 Oct 2025
Nirvana: A Specialized Generalist Model With Task-Aware Memory Mechanism
Nirvana: A Specialized Generalist Model With Task-Aware Memory Mechanism
Yuhua Jiang
Shuang Cheng
Yihao Liu
Ermo Hua
Che Jiang
Weigao Sun
Yu Cheng
Feifei Gao
Biqing Qi
Bowen Zhou
CLLKELMMoE
111
0
0
30 Oct 2025
Kimi Linear: An Expressive, Efficient Attention Architecture
Kimi Linear: An Expressive, Efficient Attention Architecture
Kimi Team
Yu Zhang
Zongyu Lin
Xingcheng Yao
J. Hu
...
Guokun Lai
Yuxin Wu
Xinyu Zhou
Zhilin Yang
Yulun Du
180
41
0
30 Oct 2025
Alias-Free ViT: Fractional Shift Invariance via Linear Attention
Alias-Free ViT: Fractional Shift Invariance via Linear Attention
H. Michaeli
Daniel Soudry
221
1
0
26 Oct 2025
Energy-Efficient Domain-Specific Artificial Intelligence Models and Agents: Pathways and Paradigms
Energy-Efficient Domain-Specific Artificial Intelligence Models and Agents: Pathways and Paradigms
Abhijit Chatterjee
N. Jha
Jonathan D. Cohen
Thomas Griffiths
Hongjing Lu
Diana Marculescu
Ashiqur Rasul
Keshab K. Parhi
LLMAGAI4CE
486
2
0
24 Oct 2025
Alleviating Forgetfulness of Linear Attention by Hybrid Sparse Attention and Contextualized Learnable Token Eviction
Alleviating Forgetfulness of Linear Attention by Hybrid Sparse Attention and Contextualized Learnable Token Eviction
Mutian He
Philip N. Garner
CLL
301
2
0
23 Oct 2025
From Masks to Worlds: A Hitchhiker's Guide to World Models
From Masks to Worlds: A Hitchhiker's Guide to World Models
Jinbin Bai
Yu Lei
H. Wu
Yuchen Zhu
Shufan Li
Yi Xin
Xiangtai Li
Molei Tao
Aditya Grover
Ming-Hsuan Yang
VGenSyDa
238
3
0
23 Oct 2025
Stateful KV Cache Management for LLMs: Balancing Space, Time, Accuracy, and Positional Fidelity
Stateful KV Cache Management for LLMs: Balancing Space, Time, Accuracy, and Positional Fidelity
Pratik Poudel
KELM
184
0
0
23 Oct 2025
Every Attention Matters: An Efficient Hybrid Architecture for Long-Context Reasoning
Every Attention Matters: An Efficient Hybrid Architecture for Long-Context Reasoning
Ling Team
Bin Han
Caizhi Tang
Chen Liang
Donghao Zhang
...
Yue Zhang
Yuchen Fang
Zibin Lin
Zixuan Cheng
Jun Zhou
LRM
270
4
0
22 Oct 2025
Understanding and Improving Length Generalization in Hierarchical Sparse Attention Models
Understanding and Improving Length Generalization in Hierarchical Sparse Attention Models
Jiaqi Leng
Xiang Hu
Junxiong Wang
Jianguo Li
Wei Wu
Yucheng Lu
188
2
0
20 Oct 2025
To Infinity and Beyond: Tool-Use Unlocks Length Generalization in State Space Models
To Infinity and Beyond: Tool-Use Unlocks Length Generalization in State Space Models
Eran Malach
Omid Saremi
Sinead Williamson
Arwen Bradley
Aryo Lotfi
Emmanuel Abbe
J. Susskind
Etai Littwin
203
1
0
16 Oct 2025
Chimera: State Space Models Beyond Sequences
Chimera: State Space Models Beyond Sequences
Aakash Lahoti
Tanya Marwah
Ratish Puduppully
Albert Gu
MambaGNNAI4CE
296
2
0
14 Oct 2025
HeSRN: Representation Learning On Heterogeneous Graphs via Slot-Aware Retentive Network
HeSRN: Representation Learning On Heterogeneous Graphs via Slot-Aware Retentive Network
Yifan Lu
Ziyun Zou
Belal Alsinglawi
Islam Al-qudah
Izzat Alsmadi
Feilong Tang
Pengfei Jiao
Shoaib Jameel
Imran Razzak
157
0
0
10 Oct 2025
Design Principles for Sequence Models via Coefficient Dynamics
Design Principles for Sequence Models via Coefficient Dynamics
Jerome Sieber
Antonio Orvieto
Melanie Zeilinger
Carmen Amo Alonso
159
0
0
10 Oct 2025
Recurrence-Complete Frame-based Action Models
Recurrence-Complete Frame-based Action Models
Michael Keiblinger
169
2
0
08 Oct 2025
Artificial Hippocampus Networks for Efficient Long-Context Modeling
Artificial Hippocampus Networks for Efficient Long-Context Modeling
Yunhao Fang
Weihao Yu
Shu Zhong
Qinghao Ye
Xuehan Xiong
Lai Wei
198
5
0
08 Oct 2025
Compressed Convolutional Attention: Efficient Attention in a Compressed Latent Space
Compressed Convolutional Attention: Efficient Attention in a Compressed Latent Space
Tomás Figliolia
Nicholas Alonso
Rishi Iyer
Quentin Anthony
Beren Millidge
MQ
173
2
0
06 Oct 2025
Sparse Query Attention (SQA): A Computationally Efficient Attention Mechanism with Query Heads Reduction
Sparse Query Attention (SQA): A Computationally Efficient Attention Mechanism with Query Heads Reduction
Adam Filipek
128
2
0
02 Oct 2025
Can Mamba Learn In Context with Outliers? A Theoretical Generalization Analysis
Can Mamba Learn In Context with Outliers? A Theoretical Generalization Analysis
Hongkang Li
Songtao Lu
Xiaodong Cui
Pin-Yu Chen
Meng Wang
MLT
214
1
0
01 Oct 2025
Local Linear Attention: An Optimal Interpolation of Linear and Softmax Attention For Test-Time Regression
Local Linear Attention: An Optimal Interpolation of Linear and Softmax Attention For Test-Time Regression
Yifei Zuo
Yutong Yin
Zhichen Zeng
Ang Li
Banghua Zhu
Zhaoran Wang
176
1
0
01 Oct 2025
VRWKV-Editor: Reducing quadratic complexity in transformer-based video editing
VRWKV-Editor: Reducing quadratic complexity in transformer-based video editing
Abdelilah Aitrouga
Youssef Hmamouche
Amal El Fallah Seghrouchni
VGen
284
0
0
30 Sep 2025
TTT3R: 3D Reconstruction as Test-Time Training
TTT3R: 3D Reconstruction as Test-Time Training
Xingyu Chen
Yue Chen
Yuliang Xiu
Andreas Geiger
Anpei Chen
3DV
391
48
0
30 Sep 2025
Context-Driven Performance Modeling for Causal Inference Operators on Neural Processing Units
Context-Driven Performance Modeling for Causal Inference Operators on Neural Processing Units
Neelesh Gupta
Rakshith Jayanth
Dhruv Parikh
Viktor Prasanna
187
0
0
29 Sep 2025
SLA: Beyond Sparsity in Diffusion Transformers via Fine-Tunable Sparse-Linear Attention
SLA: Beyond Sparsity in Diffusion Transformers via Fine-Tunable Sparse-Linear Attention
Jintao Zhang
Haoxu Wang
Kai Jiang
Shuo Yang
Kaiwen Zheng
...
Min Zhao
Ion Stoica
Joseph E. Gonzalez
Jun Zhu
Jianfei Chen
223
21
0
28 Sep 2025
ChunkLLM: A Lightweight Pluggable Framework for Accelerating LLMs Inference
ChunkLLM: A Lightweight Pluggable Framework for Accelerating LLMs Inference
Haojie Ouyang
Jianwei Lv
Lei Ren
Chen Wei
Xiaojie Wang
Fangxiang Feng
VLM
222
0
0
28 Sep 2025
StateX: Enhancing RNN Recall via Post-training State Expansion
StateX: Enhancing RNN Recall via Post-training State Expansion
Xingyu Shen
Yingfa Chen
Zhen Leng Thai
Xu Han
Zhiyuan Liu
Maosong Sun
154
1
0
26 Sep 2025
Enhancing Linear Attention with Residual Learning
Enhancing Linear Attention with Residual Learning
Xunhao Lai
Jialiang Kang
Jianqiao Lu
Tong Lin
Pengyu Zhao
KELMCLL
143
0
0
24 Sep 2025
An overview of neural architectures for self-supervised audio representation learning from masked spectrograms
An overview of neural architectures for self-supervised audio representation learning from masked spectrograms
Sarthak Yadav
Sergios Theodoridis
Zheng-Hua Tan
Mamba
280
1
0
23 Sep 2025
Achilles' Heel of Mamba: Essential difficulties of the Mamba architecture demonstrated by synthetic data
Achilles' Heel of Mamba: Essential difficulties of the Mamba architecture demonstrated by synthetic data
Jiahao Huo
Pengxiao Lin
Zhiwei Wang
Zhi-Qin John Xu
Mamba
251
3
0
22 Sep 2025
Large Language Model Scaling Laws for Neural Quantum States in Quantum Chemistry
Large Language Model Scaling Laws for Neural Quantum States in Quantum Chemistry
Oliver Knitter
Dan Zhao
Stefan Leichenauer
S. Veerapaneni
ELMLRM
253
0
0
16 Sep 2025
Point-Plane Projections for Accurate LiDAR Semantic Segmentation in Small Data Scenarios
Point-Plane Projections for Accurate LiDAR Semantic Segmentation in Small Data Scenarios
Simone Mosco
Daniel Fusaro
Wanmeng Li
Emanuele Menegatti
Alberto Pretto
3DPC
118
1
0
13 Sep 2025
Elucidating the Design Space of Decay in Linear Attention
Elucidating the Design Space of Decay in Linear Attention
Zhen Qin
Xuyang Shen
Yiran Zhong
145
2
0
05 Sep 2025
AudioRWKV: Efficient and Stable Bidirectional RWKV for Audio Pattern Recognition
AudioRWKV: Efficient and Stable Bidirectional RWKV for Audio Pattern Recognition
Jiayu Xiong
Jun Xue
Jianlong Kwan
Jing Wang
143
0
0
02 Sep 2025
1234567
Next
Page 1 of 7