Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales

Terms and Conditions

Twitter GitHub LinkedIn Bluesky Youtube

© 2026 ResearchTrend.AI, All rights reserved.

Home
Papers
2307.08621
Cited By

Retentive Network: A Successor to Transformer for Large Language Models

v1v2v3v4 (latest)

Retentive Network: A Successor to Transformer for Large Language Models

17 July 2023

ArXiv (abs)PDF HTML HuggingFace (172 upvotes)Github

Papers citing "Retentive Network: A Successor to Transformer for Large Language Models"

50 / 304 papers shown

On Structured State-Space Duality

On Structured State-Space Duality

Jerry Yao-Chieh Hu

Han Liu

159

1

0

24 Dec 2025

Continuous-Time Homeostatic Dynamics for Reentrant Inference Models

Continuous-Time Homeostatic Dynamics for Reentrant Inference Models

32

4

0

04 Dec 2025

Cache What Lasts: Token Retention for Memory-Bounded KV Cache in LLMs

Cache What Lasts: Token Retention for Memory-Bounded KV Cache in LLMs

Saumitra Mishra

150

4

0

03 Dec 2025

Gated KalmaNet: A Fading Memory Layer Through Test-Time Ridge Regression

Gated KalmaNet: A Fading Memory Layer Through Test-Time Ridge Regression

Aditya Chattopadhyay

518

3

0

26 Nov 2025

Nemotron-Flash: Towards Latency-Optimal Hybrid Small Language Models

Nemotron-Flash: Towards Latency-Optimal Hybrid Small Language Models

Matthijs Van Keirsbilck

...

Maksim Khadkevich

Pavlo Molchanov

204

7

0

24 Nov 2025

Selective Rotary Position Embedding

Selective Rotary Position Embedding

Timur Carstensen

Antonio Orvieto

378

2

0

21 Nov 2025

Goal-Directed Search Outperforms Goal-Agnostic Memory Compression in Long-Context Memory Tasks

Goal-Directed Search Outperforms Goal-Agnostic Memory Compression in Long-Context Memory Tasks

Zacharie Bugaud

Mick van Gelderen

112

2

0

20 Nov 2025

CAMS: Towards Compositional Zero-Shot Learning via Gated Cross-Attention and Multi-Space Disentanglement

CAMS: Towards Compositional Zero-Shot Learning via Gated Cross-Attention and Multi-Space Disentanglement

373

0

0

20 Nov 2025

Dynamic Nested Hierarchies: Pioneering Self-Evolution in Machine Learning Architectures for Lifelong Intelligence

Dynamic Nested Hierarchies: Pioneering Self-Evolution in Machine Learning Architectures for Lifelong Intelligence

Akbar Anbar Jafari

158

1

0

18 Nov 2025

TNT: Improving Chunkwise Training for Test-Time Memorization

TNT: Improving Chunkwise Training for Test-Time Memorization

Praneeth Kacham

Meisam Razaviyayn

266

2

0

10 Nov 2025

Recursive Dynamics in Fast-Weights Homeostatic Reentry Networks: Toward Reflective Intelligence

Recursive Dynamics in Fast-Weights Homeostatic Reentry Networks: Toward Reflective Intelligence

219

5

0

10 Nov 2025

Attention and Compression is all you need for Controllably Efficient Language Models

Attention and Compression is all you need for Controllably Efficient Language Models

Rajesh Ranganath

520

2

0

07 Nov 2025

Agentic World Modeling for 6G: Near-Real-Time Generative State-Space Reasoning

Agentic World Modeling for 6G: Near-Real-Time Generative State-Space Reasoning

Farhad Rezazadeh

Mérouane Debbah

205

2

0

04 Nov 2025

Apriel-H1: Towards Efficient Enterprise Reasoning Models

Apriel-H1: Towards Efficient Enterprise Reasoning Models

Oleksiy Ostapenko

J. Lamy-Poirier

...

Sébastien Paquet

Srinivas Sunkara

Valérie Bécaert

Sathwik Tejaswi Madhusudhan

Torsten Scholak

197

2

0

04 Nov 2025

UniLION: Towards Unified Autonomous Driving Model with Linear Group RNNs

UniLION: Towards Unified Autonomous Driving Model with Linear Group RNNs

Hengshuang Zhao

163

2

0

03 Nov 2025

Transformers as Intrinsic Optimizers: Forward Inference through the Energy Principle

Transformers as Intrinsic Optimizers: Forward Inference through the Energy Principle

243

2

0

02 Nov 2025

FlashEVA: Accelerating LLM inference via Efficient Attention

FlashEVA: Accelerating LLM inference via Efficient Attention

Juan Gabriel Kostelec

204

0

0

01 Nov 2025

Higher-order Linear Attention

Higher-order Linear Attention

103

1

0

31 Oct 2025

Nirvana: A Specialized Generalist Model With Task-Aware Memory Mechanism

Nirvana: A Specialized Generalist Model With Task-Aware Memory Mechanism

111

0

0

30 Oct 2025

Kimi Linear: An Expressive, Efficient Attention Architecture

Kimi Linear: An Expressive, Efficient Attention Architecture

...

180

41

0

30 Oct 2025

Alias-Free ViT: Fractional Shift Invariance via Linear Attention

Alias-Free ViT: Fractional Shift Invariance via Linear Attention

221

1

0

26 Oct 2025

Energy-Efficient Domain-Specific Artificial Intelligence Models and Agents: Pathways and Paradigms

Energy-Efficient Domain-Specific Artificial Intelligence Models and Agents: Pathways and Paradigms

Abhijit Chatterjee

Jonathan D. Cohen

Thomas Griffiths

Diana Marculescu

Keshab K. Parhi

486

2

0

24 Oct 2025

Alleviating Forgetfulness of Linear Attention by Hybrid Sparse Attention and Contextualized Learnable Token Eviction

Alleviating Forgetfulness of Linear Attention by Hybrid Sparse Attention and Contextualized Learnable Token Eviction

Philip N. Garner

301

2

0

23 Oct 2025

From Masks to Worlds: A Hitchhiker's Guide to World Models

From Masks to Worlds: A Hitchhiker's Guide to World Models

Ming-Hsuan Yang

238

3

0

23 Oct 2025

Stateful KV Cache Management for LLMs: Balancing Space, Time, Accuracy, and Positional Fidelity

Stateful KV Cache Management for LLMs: Balancing Space, Time, Accuracy, and Positional Fidelity

184

0

0

23 Oct 2025

Every Attention Matters: An Efficient Hybrid Architecture for Long-Context Reasoning

Every Attention Matters: An Efficient Hybrid Architecture for Long-Context Reasoning

...

270

4

0

22 Oct 2025

Understanding and Improving Length Generalization in Hierarchical Sparse Attention Models

Understanding and Improving Length Generalization in Hierarchical Sparse Attention Models

188

2

0

20 Oct 2025

To Infinity and Beyond: Tool-Use Unlocks Length Generalization in State Space Models

To Infinity and Beyond: Tool-Use Unlocks Length Generalization in State Space Models

Sinead Williamson

203

1

0

16 Oct 2025

Chimera: State Space Models Beyond Sequences

Chimera: State Space Models Beyond Sequences

Ratish Puduppully

Mamba GNN AI4CE

296

2

0

14 Oct 2025

HeSRN: Representation Learning On Heterogeneous Graphs via Slot-Aware Retentive Network

HeSRN: Representation Learning On Heterogeneous Graphs via Slot-Aware Retentive Network

Belal Alsinglawi

Imran Razzak

157

0

0

10 Oct 2025

Design Principles for Sequence Models via Coefficient Dynamics

Design Principles for Sequence Models via Coefficient Dynamics

Antonio Orvieto

Melanie Zeilinger

Carmen Amo Alonso

159

0

0

10 Oct 2025

Recurrence-Complete Frame-based Action Models

Recurrence-Complete Frame-based Action Models

Michael Keiblinger

169

2

0

08 Oct 2025

Artificial Hippocampus Networks for Efficient Long-Context Modeling

Artificial Hippocampus Networks for Efficient Long-Context Modeling

198

5

0

08 Oct 2025

Compressed Convolutional Attention: Efficient Attention in a Compressed Latent Space

Compressed Convolutional Attention: Efficient Attention in a Compressed Latent Space

Tomás Figliolia

Nicholas Alonso

Quentin Anthony

173

2

0

06 Oct 2025

Sparse Query Attention (SQA): A Computationally Efficient Attention Mechanism with Query Heads Reduction

Sparse Query Attention (SQA): A Computationally Efficient Attention Mechanism with Query Heads Reduction

128

2

0

02 Oct 2025

Can Mamba Learn In Context with Outliers? A Theoretical Generalization Analysis

Can Mamba Learn In Context with Outliers? A Theoretical Generalization Analysis

214

1

0

01 Oct 2025

Local Linear Attention: An Optimal Interpolation of Linear and Softmax Attention For Test-Time Regression

Local Linear Attention: An Optimal Interpolation of Linear and Softmax Attention For Test-Time Regression

176

1

0

01 Oct 2025

VRWKV-Editor: Reducing quadratic complexity in transformer-based video editing

VRWKV-Editor: Reducing quadratic complexity in transformer-based video editing

Abdelilah Aitrouga

Youssef Hmamouche

Amal El Fallah Seghrouchni

284

0

0

30 Sep 2025

TTT3R: 3D Reconstruction as Test-Time Training

TTT3R: 3D Reconstruction as Test-Time Training

391

48

0

30 Sep 2025

Context-Driven Performance Modeling for Causal Inference Operators on Neural Processing Units

Context-Driven Performance Modeling for Causal Inference Operators on Neural Processing Units

Rakshith Jayanth

Viktor Prasanna

187

0

0

29 Sep 2025

SLA: Beyond Sparsity in Diffusion Transformers via Fine-Tunable Sparse-Linear Attention

SLA: Beyond Sparsity in Diffusion Transformers via Fine-Tunable Sparse-Linear Attention

...

Joseph E. Gonzalez

223

21

0

28 Sep 2025

ChunkLLM: A Lightweight Pluggable Framework for Accelerating LLMs Inference

ChunkLLM: A Lightweight Pluggable Framework for Accelerating LLMs Inference

222

0

0

28 Sep 2025

StateX: Enhancing RNN Recall via Post-training State Expansion

StateX: Enhancing RNN Recall via Post-training State Expansion

154

1

0

26 Sep 2025

Enhancing Linear Attention with Residual Learning

Enhancing Linear Attention with Residual Learning

143

0

0

24 Sep 2025

An overview of neural architectures for self-supervised audio representation learning from masked spectrograms

An overview of neural architectures for self-supervised audio representation learning from masked spectrograms

Sergios Theodoridis

280

1

0

23 Sep 2025

Achilles' Heel of Mamba: Essential difficulties of the Mamba architecture demonstrated by synthetic data

Achilles' Heel of Mamba: Essential difficulties of the Mamba architecture demonstrated by synthetic data

Zhi-Qin John Xu

251

3

0

22 Sep 2025

Large Language Model Scaling Laws for Neural Quantum States in Quantum Chemistry

Large Language Model Scaling Laws for Neural Quantum States in Quantum Chemistry

Stefan Leichenauer

253

0

0

16 Sep 2025

Point-Plane Projections for Accurate LiDAR Semantic Segmentation in Small Data Scenarios

Point-Plane Projections for Accurate LiDAR Semantic Segmentation in Small Data Scenarios

Emanuele Menegatti

118

1

0

13 Sep 2025

Elucidating the Design Space of Decay in Linear Attention

Elucidating the Design Space of Decay in Linear Attention

145

2

0

05 Sep 2025

AudioRWKV: Efficient and Stable Bidirectional RWKV for Audio Pattern Recognition

AudioRWKV: Efficient and Stable Bidirectional RWKV for Audio Pattern Recognition

143

0

0

02 Sep 2025

Page 1 of 7