ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2307.08621
  4. Cited By
Retentive Network: A Successor to Transformer for Large Language Models

Retentive Network: A Successor to Transformer for Large Language Models

17 July 2023
Yutao Sun
Li Dong
Shaohan Huang
Shuming Ma
Yuqing Xia
Jilong Xue
Jianyong Wang
Furu Wei
    LRM
ArXivPDFHTML

Papers citing "Retentive Network: A Successor to Transformer for Large Language Models"

50 / 207 papers shown
Title
Learning 1D Causal Visual Representation with De-focus Attention
  Networks
Learning 1D Causal Visual Representation with De-focus Attention Networks
Chenxin Tao
Xizhou Zhu
Shiqian Su
Lewei Lu
Changyao Tian
...
Gao Huang
Hongsheng Li
Yu Qiao
Jie Zhou
Jifeng Dai
60
1
0
06 Jun 2024
RoboMamba: Multimodal State Space Model for Efficient Robot Reasoning
  and Manipulation
RoboMamba: Multimodal State Space Model for Efficient Robot Reasoning and Manipulation
Jiaming Liu
Mengzhen Liu
Zhenyu Wang
Lily Lee
Kaichen Zhou
Pengju An
Senqiao Yang
Renrui Zhang
Yandong Guo
Shanghang Zhang
LM&Ro
LRM
Mamba
27
5
0
06 Jun 2024
Exact Conversion of In-Context Learning to Model Weights in
  Linearized-Attention Transformers
Exact Conversion of In-Context Learning to Model Weights in Linearized-Attention Transformers
Brian K Chen
Tianyang Hu
Hui Jin
Hwee Kuan Lee
Kenji Kawaguchi
35
0
0
05 Jun 2024
LongSSM: On the Length Extension of State-space Models in Language
  Modelling
LongSSM: On the Length Extension of State-space Models in Language Modelling
Shida Wang
22
0
0
04 Jun 2024
You Only Scan Once: Efficient Multi-dimension Sequential Modeling with
  LightNet
You Only Scan Once: Efficient Multi-dimension Sequential Modeling with LightNet
Zhen Qin
Yuxin Mao
Xuyang Shen
Dong Li
Jing Zhang
Yuchao Dai
Yiran Zhong
50
1
0
31 May 2024
Fourier Controller Networks for Real-Time Decision-Making in Embodied
  Learning
Fourier Controller Networks for Real-Time Decision-Making in Embodied Learning
Hengkai Tan
Songming Liu
Kai Ma
Chengyang Ying
Xingxing Zhang
Hang Su
Jun Zhu
29
2
0
30 May 2024
DiG: Scalable and Efficient Diffusion Models with Gated Linear Attention
DiG: Scalable and Efficient Diffusion Models with Gated Linear Attention
Lianghui Zhu
Zilong Huang
Bencheng Liao
Jun Hao Liew
Hanshu Yan
Jiashi Feng
Xinggang Wang
65
12
0
28 May 2024
ViG: Linear-complexity Visual Sequence Learning with Gated Linear
  Attention
ViG: Linear-complexity Visual Sequence Learning with Gated Linear Attention
Bencheng Liao
Xinggang Wang
Lianghui Zhu
Qian Zhang
Chang Huang
45
3
0
28 May 2024
The Expressive Capacity of State Space Models: A Formal Language
  Perspective
The Expressive Capacity of State Space Models: A Formal Language Perspective
Yash Sarrof
Yana Veitsman
Michael Hahn
Mamba
30
7
0
27 May 2024
Unlocking the Secrets of Linear Complexity Sequence Model from A Unified
  Perspective
Unlocking the Secrets of Linear Complexity Sequence Model from A Unified Perspective
Zhen Qin
Xuyang Shen
Weigao Sun
Dong Li
Stanley T. Birchfield
Richard I. Hartley
Yiran Zhong
42
6
0
27 May 2024
Rethinking Transformers in Solving POMDPs
Rethinking Transformers in Solving POMDPs
Chenhao Lu
Ruizhe Shi
Yuyao Liu
Kaizhe Hu
Simon S. Du
Huazhe Xu
AI4CE
19
2
0
27 May 2024
Zamba: A Compact 7B SSM Hybrid Model
Zamba: A Compact 7B SSM Hybrid Model
Paolo Glorioso
Quentin G. Anthony
Yury Tokpanov
James Whittington
Jonathan Pilault
Adam Ibrahim
Beren Millidge
22
45
0
26 May 2024
Building Vision Models upon Heat Conduction
Building Vision Models upon Heat Conduction
Zhaozhi Wang
Yue Liu
Yunfan Liu
Hongtian Yu
Yaowei Wang
QiXiang Ye
ViT
VLM
50
0
0
26 May 2024
Understanding the differences in Foundation Models: Attention, State
  Space Models, and Recurrent Neural Networks
Understanding the differences in Foundation Models: Attention, State Space Models, and Recurrent Neural Networks
Jerome Sieber
Carmen Amo Alonso
A. Didier
M. Zeilinger
Antonio Orvieto
AAML
42
7
0
24 May 2024
Mamba-R: Vision Mamba ALSO Needs Registers
Mamba-R: Vision Mamba ALSO Needs Registers
Feng Wang
Jiahao Wang
Sucheng Ren
Guoyizhe Wei
Jieru Mei
Wei Shao
Yuyin Zhou
Alan L. Yuille
Cihang Xie
Mamba
26
19
0
23 May 2024
Lessons from the Trenches on Reproducible Evaluation of Language Models
Lessons from the Trenches on Reproducible Evaluation of Language Models
Stella Biderman
Hailey Schoelkopf
Lintang Sutawika
Leo Gao
J. Tow
...
Xiangru Tang
Kevin A. Wang
Genta Indra Winata
Franccois Yvon
Andy Zou
ELM
ALM
120
52
3
23 May 2024
Attention as an RNN
Attention as an RNN
Leo Feng
Frederick Tung
Hossein Hajimirsadeghi
Mohamed Osama Ahmed
Yoshua Bengio
Greg Mori
GNN
AI4TS
41
8
0
22 May 2024
Improving Transformers with Dynamically Composable Multi-Head Attention
Improving Transformers with Dynamically Composable Multi-Head Attention
Da Xiao
Qingye Meng
Shengping Li
Xingyuan Yuan
26
2
0
14 May 2024
Linearizing Large Language Models
Linearizing Large Language Models
Jean-Pierre Mercat
Igor Vasiljevic
Sedrick Scott Keh
Kushal Arora
Achal Dave
Adrien Gaidon
Thomas Kollar
32
19
0
10 May 2024
Memory Mosaics
Memory Mosaics
Jianyu Zhang
Niklas Nolte
Ranajoy Sadhukhan
Beidi Chen
Léon Bottou
VLM
44
3
0
10 May 2024
You Only Cache Once: Decoder-Decoder Architectures for Language Models
You Only Cache Once: Decoder-Decoder Architectures for Language Models
Yutao Sun
Li Dong
Yi Zhu
Shaohan Huang
Wenhui Wang
Shuming Ma
Quanlu Zhang
Jianyong Wang
Furu Wei
VLM
25
52
0
08 May 2024
A Survey on Visual Mamba
A Survey on Visual Mamba
Hanwei Zhang
Ying Zhu
Dan Wang
Lijun Zhang
Tianxiang Chen
Zi Ye
Mamba
32
52
0
24 Apr 2024
Gradformer: Graph Transformer with Exponential Decay
Gradformer: Graph Transformer with Exponential Decay
Chuang Liu
Zelin Yao
Yibing Zhan
Xueqi Ma
Shirui Pan
Wenbin Hu
26
4
0
24 Apr 2024
From Matching to Generation: A Survey on Generative Information Retrieval
From Matching to Generation: A Survey on Generative Information Retrieval
Xiaoxi Li
Jiajie Jin
Yujia Zhou
Yuyao Zhang
Peitian Zhang
Yutao Zhu
Zhicheng Dou
3DV
64
45
0
23 Apr 2024
A Survey on Efficient Inference for Large Language Models
A Survey on Efficient Inference for Large Language Models
Zixuan Zhou
Xuefei Ning
Ke Hong
Tianyu Fu
Jiaming Xu
...
Shengen Yan
Guohao Dai
Xiao-Ping Zhang
Yuhan Dong
Yu-Xiang Wang
46
78
0
22 Apr 2024
State Space Model for New-Generation Network Alternative to
  Transformers: A Survey
State Space Model for New-Generation Network Alternative to Transformers: A Survey
Xiao Wang
Shiao Wang
Yuhe Ding
Yuehang Li
Wentao Wu
...
Bowei Jiang
Chenglong Li
Yaowei Wang
Yonghong Tian
Jin Tang
Mamba
33
48
0
15 Apr 2024
HGRN2: Gated Linear RNNs with State Expansion
HGRN2: Gated Linear RNNs with State Expansion
Zhen Qin
Songlin Yang
Weixuan Sun
Xuyang Shen
Dong Li
Weigao Sun
Yiran Zhong
LRM
34
45
0
11 Apr 2024
Band-Attention Modulated RetNet for Face Forgery Detection
Band-Attention Modulated RetNet for Face Forgery Detection
Zhida Zhang
Jie Cao
Wenkui Yang
Qihang Fan
Kai Zhou
Ran He
CVBM
ViT
20
1
0
09 Apr 2024
Eagle and Finch: RWKV with Matrix-Valued States and Dynamic Recurrence
Eagle and Finch: RWKV with Matrix-Valued States and Dynamic Recurrence
Bo Peng
Daniel Goldstein
Quentin G. Anthony
Alon Albalak
Eric Alcaide
...
Bingchen Zhao
Qihang Zhao
Peng Zhou
Jian Zhu
Ruijie Zhu
43
73
0
08 Apr 2024
Linear Attention Sequence Parallelism
Linear Attention Sequence Parallelism
Weigao Sun
Zhen Qin
Dong Li
Xuyang Shen
Yu Qiao
Yiran Zhong
68
2
0
03 Apr 2024
Cross-Architecture Transfer Learning for Linear-Cost Inference
  Transformers
Cross-Architecture Transfer Learning for Linear-Cost Inference Transformers
Sehyun Choi
19
3
0
03 Apr 2024
DiJiang: Efficient Large Language Models through Compact Kernelization
DiJiang: Efficient Large Language Models through Compact Kernelization
Hanting Chen
Zhicheng Liu
Xutao Wang
Yuchuan Tian
Yunhe Wang
VLM
24
5
0
29 Mar 2024
PlainMamba: Improving Non-Hierarchical Mamba in Visual Recognition
PlainMamba: Improving Non-Hierarchical Mamba in Visual Recognition
Chenhongyi Yang
Zehui Chen
Miguel Espinosa
Linus Ericsson
Zhenyu Wang
Jiaming Liu
Elliot J. Crowley
Mamba
26
86
0
26 Mar 2024
SiMBA: Simplified Mamba-Based Architecture for Vision and Multivariate
  Time series
SiMBA: Simplified Mamba-Based Architecture for Vision and Multivariate Time series
Badri N. Patro
Vijay Srinivas Agneeswaran
Mamba
49
50
0
22 Mar 2024
Hierarchical Skip Decoding for Efficient Autoregressive Text Generation
Hierarchical Skip Decoding for Efficient Autoregressive Text Generation
Yunqi Zhu
Xuebing Yang
Yuanyuan Wu
Wensheng Zhang
24
3
0
22 Mar 2024
EEGDiR: Electroencephalogram denoising network for temporal information
  storage and global modeling through Retentive Network
EEGDiR: Electroencephalogram denoising network for temporal information storage and global modeling through Retentive Network
Bin Wang
Fei Deng
Peifan Jiang
14
6
0
20 Mar 2024
USE: Dynamic User Modeling with Stateful Sequence Models
USE: Dynamic User Modeling with Stateful Sequence Models
Zhihan Zhou
Qixiang Fang
Leonardo Neves
Francesco Barbieri
Yozen Liu
Han Liu
Maarten W. Bos
Ron Dotsch
25
0
0
20 Mar 2024
A Contact Model based on Denoising Diffusion to Learn Variable Impedance
  Control for Contact-rich Manipulation
A Contact Model based on Denoising Diffusion to Learn Variable Impedance Control for Contact-rich Manipulation
Masashi Okada
Mayumi Komatsu
Tadahiro Taniguchi
DiffM
27
0
0
20 Mar 2024
On the low-shot transferability of [V]-Mamba
On the low-shot transferability of [V]-Mamba
Diganta Misra
Jay Gala
Antonio Orvieto
Mamba
34
1
0
15 Mar 2024
Video Mamba Suite: State Space Model as a Versatile Alternative for
  Video Understanding
Video Mamba Suite: State Space Model as a Versatile Alternative for Video Understanding
Guo Chen
Yifei Huang
Jilan Xu
Baoqi Pei
Zhe Chen
Zhiqi Li
Jiahao Wang
Kunchang Li
Tong Lu
Limin Wang
Mamba
64
72
0
14 Mar 2024
Rethinking Generative Large Language Model Evaluation for Semantic
  Comprehension
Rethinking Generative Large Language Model Evaluation for Semantic Comprehension
Fangyun Wei
Xi Chen
Linzi Luo
ELM
ALM
LRM
27
7
0
12 Mar 2024
Multichannel Long-Term Streaming Neural Speech Enhancement for Static
  and Moving Speakers
Multichannel Long-Term Streaming Neural Speech Enhancement for Static and Moving Speakers
Changsheng Quan
Xiaofei Li
39
23
0
12 Mar 2024
TrafficGPT: Breaking the Token Barrier for Efficient Long Traffic
  Analysis and Generation
TrafficGPT: Breaking the Token Barrier for Efficient Long Traffic Analysis and Generation
Jian Qu
Xiaobo Ma
Jianfeng Li
AI4TS
26
10
0
09 Mar 2024
Taming Throughput-Latency Tradeoff in LLM Inference with Sarathi-Serve
Taming Throughput-Latency Tradeoff in LLM Inference with Sarathi-Serve
Amey Agrawal
Nitin Kedia
Ashish Panwar
Jayashree Mohan
Nipun Kwatra
Bhargav S. Gulavani
Alexey Tumanov
R. Ramjee
36
147
0
04 Mar 2024
Vision-RWKV: Efficient and Scalable Visual Perception with RWKV-Like Architectures
Vision-RWKV: Efficient and Scalable Visual Perception with RWKV-Like Architectures
Yuchen Duan
Weiyun Wang
Zhe Chen
Xizhou Zhu
Lewei Lu
Tong Lu
Yu Qiao
Hongsheng Li
Jifeng Dai
Wenhai Wang
ViT
38
44
0
04 Mar 2024
The Hidden Attention of Mamba Models
The Hidden Attention of Mamba Models
Ameen Ali
Itamar Zimerman
Lior Wolf
Mamba
32
57
0
03 Mar 2024
Griffin: Mixing Gated Linear Recurrences with Local Attention for
  Efficient Language Models
Griffin: Mixing Gated Linear Recurrences with Local Attention for Efficient Language Models
Soham De
Samuel L. Smith
Anushan Fernando
Aleksandar Botev
George-Christian Muraru
...
David Budden
Yee Whye Teh
Razvan Pascanu
Nando de Freitas
Çağlar Gülçehre
Mamba
53
116
0
29 Feb 2024
Theoretical Foundations of Deep Selective State-Space Models
Theoretical Foundations of Deep Selective State-Space Models
Nicola Muca Cirone
Antonio Orvieto
Benjamin Walker
C. Salvi
Terry Lyons
Mamba
45
24
0
29 Feb 2024
RNNs are not Transformers (Yet): The Key Bottleneck on In-context
  Retrieval
RNNs are not Transformers (Yet): The Key Bottleneck on In-context Retrieval
Kaiyue Wen
Xingyu Dang
Kaifeng Lyu
34
24
0
28 Feb 2024
DenseMamba: State Space Models with Dense Hidden Connection for
  Efficient Large Language Models
DenseMamba: State Space Models with Dense Hidden Connection for Efficient Large Language Models
Wei He
Kai Han
Yehui Tang
Chengcheng Wang
Yujie Yang
Tianyu Guo
Yunhe Wang
Mamba
53
25
0
26 Feb 2024
Previous
12345
Next