ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2307.08621
  4. Cited By
Retentive Network: A Successor to Transformer for Large Language Models

Retentive Network: A Successor to Transformer for Large Language Models

17 July 2023
Yutao Sun
Li Dong
Shaohan Huang
Shuming Ma
Yuqing Xia
Jilong Xue
Jianyong Wang
Furu Wei
    LRM
ArXivPDFHTML

Papers citing "Retentive Network: A Successor to Transformer for Large Language Models"

50 / 207 papers shown
Title
Transformer-Based Approaches for Sensor-Based Human Activity
  Recognition: Opportunities and Challenges
Transformer-Based Approaches for Sensor-Based Human Activity Recognition: Opportunities and Challenges
Clayton Frederick Souza Leite
Henry Mauranen
Aziza Zhanabatyrova
Yu Xiao
24
0
0
17 Oct 2024
SeerAttention: Learning Intrinsic Sparse Attention in Your LLMs
SeerAttention: Learning Intrinsic Sparse Attention in Your LLMs
Yizhao Gao
Zhichen Zeng
Dayou Du
Shijie Cao
Hayden Kwok-Hay So
...
Junjie Lai
Mao Yang
Ting Cao
Fan Yang
M. Yang
47
18
0
17 Oct 2024
On Divergence Measures for Training GFlowNets
On Divergence Measures for Training GFlowNets
Tiago da Silva
Eliezer de Souza da Silva
Diego Mesquita
BDL
24
1
0
12 Oct 2024
Efficiently Scanning and Resampling Spatio-Temporal Tasks with Irregular
  Observations
Efficiently Scanning and Resampling Spatio-Temporal Tasks with Irregular Observations
Bryce Ferenczi
Michael G. Burke
Tom Drummond
26
0
0
11 Oct 2024
Parameter-Efficient Fine-Tuning of State Space Models
Parameter-Efficient Fine-Tuning of State Space Models
Kevin Galim
Wonjun Kang
Yuchen Zeng
H. Koo
Kangwook Lee
29
4
0
11 Oct 2024
Stuffed Mamba: State Collapse and State Capacity of RNN-Based
  Long-Context Modeling
Stuffed Mamba: State Collapse and State Capacity of RNN-Based Long-Context Modeling
Yingfa Chen
Xinrong Zhang
Shengding Hu
Xu Han
Zhiyuan Liu
Maosong Sun
Mamba
51
2
0
09 Oct 2024
MatMamba: A Matryoshka State Space Model
MatMamba: A Matryoshka State Space Model
Abhinav Shukla
Sai H. Vemprala
Aditya Kusupati
Ashish Kapoor
Mamba
28
0
0
09 Oct 2024
Towards Universality: Studying Mechanistic Similarity Across Language
  Model Architectures
Towards Universality: Studying Mechanistic Similarity Across Language Model Architectures
Junxuan Wang
Xuyang Ge
Wentao Shu
Qiong Tang
Yunhua Zhou
Zhengfu He
Xipeng Qiu
27
7
0
09 Oct 2024
LS-EEND: Long-Form Streaming End-to-End Neural Diarization with Online
  Attractor Extraction
LS-EEND: Long-Form Streaming End-to-End Neural Diarization with Online Attractor Extraction
Di Liang
Xiaofei Li
19
0
0
09 Oct 2024
Rodimus*: Breaking the Accuracy-Efficiency Trade-Off with Efficient
  Attentions
Rodimus*: Breaking the Accuracy-Efficiency Trade-Off with Efficient Attentions
Zhihao He
Hang Yu
Zi Gong
Shizhan Liu
Jianguo Li
Weiyao Lin
VLM
33
1
0
09 Oct 2024
Joint Fine-tuning and Conversion of Pretrained Speech and Language Models towards Linear Complexity
Joint Fine-tuning and Conversion of Pretrained Speech and Language Models towards Linear Complexity
Mutian He
Philip N. Garner
80
0
0
09 Oct 2024
Correlation-Aware Select and Merge Attention for Efficient Fine-Tuning
  and Context Length Extension
Correlation-Aware Select and Merge Attention for Efficient Fine-Tuning and Context Length Extension
Ning Wang
Zekun Li
Tongxin Bai
Guoqi Li
27
0
0
05 Oct 2024
RetCompletion:High-Speed Inference Image Completion with Retentive
  Network
RetCompletion:High-Speed Inference Image Completion with Retentive Network
Yueyang Cang
P. Hu
Xiaoteng Zhang
Xingtong Wang
Yuhang Liu
VLM
24
0
0
05 Oct 2024
Can Mamba Always Enjoy the "Free Lunch"?
Can Mamba Always Enjoy the "Free Lunch"?
Ruifeng Ren
Zhicong Li
Yong Liu
39
1
0
04 Oct 2024
How to Train Long-Context Language Models (Effectively)
How to Train Long-Context Language Models (Effectively)
Tianyu Gao
Alexander Wettig
Howard Yen
Danqi Chen
RALM
66
37
0
03 Oct 2024
Were RNNs All We Needed?
Were RNNs All We Needed?
Leo Feng
Frederick Tung
Mohamed Osama Ahmed
Yoshua Bengio
Hossein Hajimirsadegh
AI4TS
23
14
1
02 Oct 2024
On the Power of Decision Trees in Auto-Regressive Language Modeling
On the Power of Decision Trees in Auto-Regressive Language Modeling
Yulu Gan
Tomer Galanti
T. Poggio
Eran Malach
AI4CE
13
0
0
27 Sep 2024
Towards LifeSpan Cognitive Systems
Towards LifeSpan Cognitive Systems
Yu Wang
Chi Han
Tongtong Wu
Xiaoxin He
Wangchunshu Zhou
...
Zexue He
Wei Wang
Gholamreza Haffari
Heng Ji
Julian McAuley
KELM
CLL
83
1
0
20 Sep 2024
Gated Slot Attention for Efficient Linear-Time Sequence Modeling
Gated Slot Attention for Efficient Linear-Time Sequence Modeling
Yu Zhang
Songlin Yang
Ruijie Zhu
Yue Zhang
Leyang Cui
...
Freda Shi
Bailin Wang
Wei Bi
P. Zhou
Guohong Fu
60
15
0
11 Sep 2024
The USTC-NERCSLIP Systems for the CHiME-8 NOTSOFAR-1 Challenge
The USTC-NERCSLIP Systems for the CHiME-8 NOTSOFAR-1 Challenge
Shutong Niu
Ruoyu Wang
Jun Du
Gaobin Yang
Yanhui Tu
...
Tian Gao
Genshun Wan
Feng Ma
Jia Pan
Jianqing Gao
26
4
0
03 Sep 2024
Shifted Window Fourier Transform And Retention For Image Captioning
Shifted Window Fourier Transform And Retention For Image Captioning
J. Hu
Roberto Cavicchioli
Alessandro Capotondi
VLM
24
0
0
25 Aug 2024
Transformers to SSMs: Distilling Quadratic Knowledge to Subquadratic Models
Transformers to SSMs: Distilling Quadratic Knowledge to Subquadratic Models
Aviv Bick
Kevin Y. Li
Eric P. Xing
J. Zico Kolter
Albert Gu
Mamba
43
24
0
19 Aug 2024
Fast Information Streaming Handler (FisH): A Unified Seismic Neural
  Network for Single Station Real-Time Earthquake Early Warning
Fast Information Streaming Handler (FisH): A Unified Seismic Neural Network for Single Station Real-Time Earthquake Early Warning
Tianning Zhang
Feng Liu
Yuming Yuan
Rui Su
Wanli Ouyang
Lei Bai
21
0
0
13 Aug 2024
Towards Resilient and Efficient LLMs: A Comparative Study of Efficiency,
  Performance, and Adversarial Robustness
Towards Resilient and Efficient LLMs: A Comparative Study of Efficiency, Performance, and Adversarial Robustness
Xiaojing Fan
Chunliang Tao
AAML
29
28
0
08 Aug 2024
What comes after transformers? -- A selective survey connecting ideas in
  deep learning
What comes after transformers? -- A selective survey connecting ideas in deep learning
Johannes Schneider
AI4CE
27
2
0
01 Aug 2024
Enhanced Structured State Space Models via Grouped FIR Filtering and
  Attention Sink Mechanisms
Enhanced Structured State Space Models via Grouped FIR Filtering and Attention Sink Mechanisms
Yueran Zhang
Yating Yu
Lingtong Min
Mamba
23
0
0
01 Aug 2024
LION: Linear Group RNN for 3D Object Detection in Point Clouds
LION: Linear Group RNN for 3D Object Detection in Point Clouds
Zhe Liu
Jinghua Hou
Xinyu Wang
Xiaoqing Ye
Jingdong Wang
Hengshuang Zhao
Xiang Bai
3DPC
53
11
0
25 Jul 2024
Keep the Cost Down: A Review on Methods to Optimize LLM' s KV-Cache
  Consumption
Keep the Cost Down: A Review on Methods to Optimize LLM' s KV-Cache Consumption
Shi Luohe
Hongyi Zhang
Yao Yao
Z. Li
Zhao Hai
31
31
0
25 Jul 2024
Longhorn: State Space Models are Amortized Online Learners
Longhorn: State Space Models are Amortized Online Learners
Bo Liu
Rui Wang
Lemeng Wu
Yihao Feng
Peter Stone
Qian Liu
46
10
0
19 Jul 2024
GoldFinch: High Performance RWKV/Transformer Hybrid with Linear Pre-Fill
  and Extreme KV-Cache Compression
GoldFinch: High Performance RWKV/Transformer Hybrid with Linear Pre-Fill and Extreme KV-Cache Compression
Daniel Goldstein
Fares Obeid
Eric Alcaide
Guangyu Song
Eugene Cheah
VLM
AI4TS
24
7
0
16 Jul 2024
Hydra: Bidirectional State Space Models Through Generalized Matrix
  Mixers
Hydra: Bidirectional State Space Models Through Generalized Matrix Mixers
Sukjun Hwang
Aakash Lahoti
Tri Dao
Albert Gu
Mamba
52
11
0
13 Jul 2024
ST-RetNet: A Long-term Spatial-Temporal Traffic Flow Prediction Method
ST-RetNet: A Long-term Spatial-Temporal Traffic Flow Prediction Method
Baichao Long
Wang Zhu
Jianli Xiao
GNN
AI4TS
18
1
0
13 Jul 2024
FlashAttention-3: Fast and Accurate Attention with Asynchrony and
  Low-precision
FlashAttention-3: Fast and Accurate Attention with Asynchrony and Low-precision
Jay Shah
Ganesh Bikshandi
Ying Zhang
Vijay Thakkar
Pradeep Ramani
Tri Dao
48
112
0
11 Jul 2024
Spatial-Temporal Attention Model for Traffic State Estimation with
  Sparse Internet of Vehicles
Spatial-Temporal Attention Model for Traffic State Estimation with Sparse Internet of Vehicles
Jianzhe Xue
Dongcheng Yuan
Yu Sun
Tianqi Zhang
Wenchao Xu
Haibo Zhou
Xuemin
Shen
21
1
0
10 Jul 2024
Mamba-FSCIL: Dynamic Adaptation with Selective State Space Model for
  Few-Shot Class-Incremental Learning
Mamba-FSCIL: Dynamic Adaptation with Selective State Space Model for Few-Shot Class-Incremental Learning
Xiaojie Li
Yibo Yang
Jianlong Wu
Bernard Ghanem
Liqiang Nie
Min Zhang
Mamba
36
5
0
08 Jul 2024
Focus on the Whole Character: Discriminative Character Modeling for
  Scene Text Recognition
Focus on the Whole Character: Discriminative Character Modeling for Scene Text Recognition
Bangbang Zhou
Yadong Qu
Zixiao Wang
Zicheng Li
Boqiang Zhang
Hongtao Xie
35
1
0
08 Jul 2024
MInference 1.0: Accelerating Pre-filling for Long-Context LLMs via
  Dynamic Sparse Attention
MInference 1.0: Accelerating Pre-filling for Long-Context LLMs via Dynamic Sparse Attention
Huiqiang Jiang
Yucheng Li
Chengruidong Zhang
Qianhui Wu
Xufang Luo
...
Amir H. Abdi
Dongsheng Li
Chin-Yew Lin
Yuqing Yang
L. Qiu
67
81
0
02 Jul 2024
KV Cache Compression, But What Must We Give in Return? A Comprehensive
  Benchmark of Long Context Capable Approaches
KV Cache Compression, But What Must We Give in Return? A Comprehensive Benchmark of Long Context Capable Approaches
Jiayi Yuan
Hongyi Liu
Shaochen
Zhong
Yu-Neng Chuang
...
Hongye Jin
V. Chaudhary
Zhaozhuo Xu
Zirui Liu
Xia Hu
34
17
0
01 Jul 2024
DEX-TTS: Diffusion-based EXpressive Text-to-Speech with Style Modeling
  on Time Variability
DEX-TTS: Diffusion-based EXpressive Text-to-Speech with Style Modeling on Time Variability
Hyun Joon Park
Jin Sob Kim
Wooseok Shin
Sung Won Han
DiffM
23
2
0
27 Jun 2024
Scalable Artificial Intelligence for Science: Perspectives, Methods and
  Exemplars
Scalable Artificial Intelligence for Science: Perspectives, Methods and Exemplars
Wesley Brewer
Aditya Kashi
Sajal Dash
A. Tsaris
Junqi Yin
Mallikarjun Shankar
Feiyi Wang
33
0
0
24 Jun 2024
Sparser is Faster and Less is More: Efficient Sparse Attention for
  Long-Range Transformers
Sparser is Faster and Less is More: Efficient Sparse Attention for Long-Range Transformers
Chao Lou
Zixia Jia
Zilong Zheng
Kewei Tu
ODL
26
18
0
24 Jun 2024
Vision Mamba-based autonomous crack segmentation on concrete, asphalt,
  and masonry surfaces
Vision Mamba-based autonomous crack segmentation on concrete, asphalt, and masonry surfaces
Zhaohui Chen
Elyas Asadi Shamsabadi
Sheng Jiang
Luming Shen
Daniel Dias-da-Costa
Mamba
35
3
0
24 Jun 2024
MoA: Mixture of Sparse Attention for Automatic Large Language Model
  Compression
MoA: Mixture of Sparse Attention for Automatic Large Language Model Compression
Tianyu Fu
Haofeng Huang
Xuefei Ning
Genghan Zhang
Boju Chen
...
Shiyao Li
Shengen Yan
Guohao Dai
Huazhong Yang
Yu Wang
MQ
41
16
0
21 Jun 2024
CherryRec: Enhancing News Recommendation Quality via LLM-driven
  Framework
CherryRec: Enhancing News Recommendation Quality via LLM-driven Framework
Shaohuang Wang
Lun Wang
Yunhan Bu
Tianwei Huang
30
2
0
18 Jun 2024
Generalisation to unseen topologies: Towards control of biological
  neural network activity
Generalisation to unseen topologies: Towards control of biological neural network activity
Laurens Engwegen
Daan Brinks
Wendelin Bohmer
MedIm
AI4CE
25
0
0
17 Jun 2024
Separations in the Representational Capabilities of Transformers and
  Recurrent Architectures
Separations in the Representational Capabilities of Transformers and Recurrent Architectures
S. Bhattamishra
Michael Hahn
Phil Blunsom
Varun Kanade
GNN
28
8
0
13 Jun 2024
Cognitively Inspired Energy-Based World Models
Cognitively Inspired Energy-Based World Models
Alexi Gladstone
Ganesh Nanduru
Md. Mofijul Islam
Aman Chadha
Jundong Li
Tariq Iqbal
28
0
0
13 Jun 2024
Samba: Simple Hybrid State Space Models for Efficient Unlimited Context Language Modeling
Samba: Simple Hybrid State Space Models for Efficient Unlimited Context Language Modeling
Liliang Ren
Yang Liu
Yadong Lu
Yelong Shen
Chen Liang
Weizhu Chen
Mamba
64
54
0
11 Jun 2024
What Can We Learn from State Space Models for Machine Learning on
  Graphs?
What Can We Learn from State Space Models for Machine Learning on Graphs?
Yinan Huang
Siqi Miao
Pan Li
39
7
0
09 Jun 2024
Small-E: Small Language Model with Linear Attention for Efficient Speech
  Synthesis
Small-E: Small Language Model with Linear Attention for Efficient Speech Synthesis
Théodor Lemerle
Nicolas Obin
Axel Roebel
29
6
0
06 Jun 2024
Previous
12345
Next