ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2307.08621
  4. Cited By
Retentive Network: A Successor to Transformer for Large Language Models

Retentive Network: A Successor to Transformer for Large Language Models

17 July 2023
Yutao Sun
Li Dong
Shaohan Huang
Shuming Ma
Yuqing Xia
Jilong Xue
Jianyong Wang
Furu Wei
    LRM
ArXivPDFHTML

Papers citing "Retentive Network: A Successor to Transformer for Large Language Models"

50 / 208 papers shown
Title
DenseMamba: State Space Models with Dense Hidden Connection for
  Efficient Large Language Models
DenseMamba: State Space Models with Dense Hidden Connection for Efficient Large Language Models
Wei He
Kai Han
Yehui Tang
Chengcheng Wang
Yujie Yang
Tianyu Guo
Yunhe Wang
Mamba
53
25
0
26 Feb 2024
MegaScale: Scaling Large Language Model Training to More Than 10,000
  GPUs
MegaScale: Scaling Large Language Model Training to More Than 10,000 GPUs
Ziheng Jiang
Haibin Lin
Yinmin Zhong
Qi Huang
Yangrui Chen
...
Zhe Li
X. Jia
Jia-jun Ye
Xin Jin
Xin Liu
LRM
38
99
0
23 Feb 2024
Do Efficient Transformers Really Save Computation?
Do Efficient Transformers Really Save Computation?
Kai-Bo Yang
Jan Ackermann
Zhenyu He
Guhao Feng
Bohang Zhang
Yunzhen Feng
Qiwei Ye
Di He
Liwei Wang
23
8
0
21 Feb 2024
Locality-Sensitive Hashing-Based Efficient Point Transformer with
  Applications in High-Energy Physics
Locality-Sensitive Hashing-Based Efficient Point Transformer with Applications in High-Energy Physics
Siqi Miao
Zhiyuan Lu
Mia Liu
Javier Duarte
Pan Li
29
4
0
19 Feb 2024
Data Engineering for Scaling Language Models to 128K Context
Data Engineering for Scaling Language Models to 128K Context
Yao Fu
Rameswar Panda
Xinyao Niu
Xiang Yue
Hanna Hajishirzi
Yoon Kim
Hao-Chun Peng
MoE
36
115
0
15 Feb 2024
Bidirectional Generative Pre-training for Improving Time Series
  Representation Learning
Bidirectional Generative Pre-training for Improving Time Series Representation Learning
Ziyang Song
Qincheng Lu
He Zhu
Yue Li
AI4TS
14
3
0
14 Feb 2024
Get More with LESS: Synthesizing Recurrence with KV Cache Compression
  for Efficient LLM Inference
Get More with LESS: Synthesizing Recurrence with KV Cache Compression for Efficient LLM Inference
Harry Dong
Xinyu Yang
Zhenyu (Allen) Zhang
Zhangyang Wang
Yuejie Chi
Beidi Chen
27
47
0
14 Feb 2024
On the Resurgence of Recurrent Models for Long Sequences -- Survey and
  Research Opportunities in the Transformer Era
On the Resurgence of Recurrent Models for Long Sequences -- Survey and Research Opportunities in the Transformer Era
Matteo Tiezzi
Michele Casoni
Alessandro Betti
Tommaso Guidi
Marco Gori
S. Melacci
16
9
0
12 Feb 2024
FAST: Factorizable Attention for Speeding up Transformers
FAST: Factorizable Attention for Speeding up Transformers
Armin Gerami
Monte Hoover
P. S. Dulepet
R. Duraiswami
22
0
0
12 Feb 2024
Improving Token-Based World Models with Parallel Observation Prediction
Improving Token-Based World Models with Parallel Observation Prediction
Lior Cohen
Kaixin Wang
Bingyi Kang
Shie Mannor
10
2
0
08 Feb 2024
Neural Circuit Diagrams: Robust Diagrams for the Communication,
  Implementation, and Analysis of Deep Learning Architectures
Neural Circuit Diagrams: Robust Diagrams for the Communication, Implementation, and Analysis of Deep Learning Architectures
Vincent Abbott
40
4
0
08 Feb 2024
Can Mamba Learn How to Learn? A Comparative Study on In-Context Learning
  Tasks
Can Mamba Learn How to Learn? A Comparative Study on In-Context Learning Tasks
Jongho Park
Jaeseung Park
Zheyang Xiong
Nayoung Lee
Jaewoong Cho
Samet Oymak
Kangwook Lee
Dimitris Papailiopoulos
19
69
0
06 Feb 2024
UniMem: Towards a Unified View of Long-Context Large Language Models
UniMem: Towards a Unified View of Long-Context Large Language Models
Junjie Fang
Likai Tang
Hongzhe Bi
Yujia Qin
Si Sun
...
Xiaodong Shi
Sen Song
Yankai Lin
Zhiyuan Liu
Maosong Sun
16
3
0
05 Feb 2024
A Survey on Transformer Compression
A Survey on Transformer Compression
Yehui Tang
Yunhe Wang
Jianyuan Guo
Zhijun Tu
Kai Han
Hailin Hu
Dacheng Tao
29
26
0
05 Feb 2024
Beyond the Limits: A Survey of Techniques to Extend the Context Length
  in Large Language Models
Beyond the Limits: A Survey of Techniques to Extend the Context Length in Large Language Models
Xindi Wang
Mahsa Salmani
Parsa Omidi
Xiangyu Ren
Mehdi Rezagholizadeh
A. Eshaghi
LRM
29
35
0
03 Feb 2024
Repeat After Me: Transformers are Better than State Space Models at
  Copying
Repeat After Me: Transformers are Better than State Space Models at Copying
Samy Jelassi
David Brandfonbrener
Sham Kakade
Eran Malach
95
77
0
01 Feb 2024
Investigating Recurrent Transformers with Dynamic Halt
Investigating Recurrent Transformers with Dynamic Halt
Jishnu Ray Chowdhury
Cornelia Caragea
34
1
0
01 Feb 2024
BlackMamba: Mixture of Experts for State-Space Models
BlackMamba: Mixture of Experts for State-Space Models
Quentin G. Anthony
Yury Tokpanov
Paolo Glorioso
Beren Millidge
20
21
0
01 Feb 2024
Forecasting VIX using Bayesian Deep Learning
Forecasting VIX using Bayesian Deep Learning
Héctor J. Hortúa
Andrés Mora-Valencia
BDL
OOD
13
4
0
30 Jan 2024
In-Context Language Learning: Architectures and Algorithms
In-Context Language Learning: Architectures and Algorithms
Ekin Akyürek
Bailin Wang
Yoon Kim
Jacob Andreas
LRM
ReLM
43
40
0
23 Jan 2024
Vision Mamba: Efficient Visual Representation Learning with
  Bidirectional State Space Model
Vision Mamba: Efficient Visual Representation Learning with Bidirectional State Space Model
Lianghui Zhu
Bencheng Liao
Qian Zhang
Xinlong Wang
Wenyu Liu
Xinggang Wang
Mamba
32
699
0
17 Jan 2024
RWKV-TS: Beyond Traditional Recurrent Neural Network for Time Series
  Tasks
RWKV-TS: Beyond Traditional Recurrent Neural Network for Time Series Tasks
Haowen Hou
F. Richard Yu
AI4TS
28
19
0
17 Jan 2024
Lightning Attention-2: A Free Lunch for Handling Unlimited Sequence
  Lengths in Large Language Models
Lightning Attention-2: A Free Lunch for Handling Unlimited Sequence Lengths in Large Language Models
Zhen Qin
Weigao Sun
Dong Li
Xuyang Shen
Weixuan Sun
Yiran Zhong
62
21
0
09 Jan 2024
SpiNNaker2: A Large-Scale Neuromorphic System for Event-Based and
  Asynchronous Machine Learning
SpiNNaker2: A Large-Scale Neuromorphic System for Event-Based and Asynchronous Machine Learning
Hector A. Gonzalez
Jiaxin Huang
Florian Kelber
Khaleelulla Khan Nazeer
Tim Langer
...
Bernhard Vogginger
Timo C. Wunderlich
Yexin Yan
Mahmoud Akl
Christian Mayr
19
15
0
09 Jan 2024
MoE-Mamba: Efficient Selective State Space Models with Mixture of
  Experts
MoE-Mamba: Efficient Selective State Space Models with Mixture of Experts
Maciej Pióro
Kamil Ciebiera
Krystian Król
Jan Ludziejewski
Michał Krutul
Jakub Krajewski
Szymon Antoniak
Piotr Miłoś
Marek Cygan
Sebastian Jaszczur
MoE
Mamba
20
54
0
08 Jan 2024
Multi-relational Graph Diffusion Neural Network with Parallel Retention
  for Stock Trends Classification
Multi-relational Graph Diffusion Neural Network with Parallel Retention for Stock Trends Classification
Zinuo You
Pengju Zhang
Jin Zheng
John Cartlidge
AIFin
DiffM
13
5
0
05 Jan 2024
FlashVideo: A Framework for Swift Inference in Text-to-Video Generation
FlashVideo: A Framework for Swift Inference in Text-to-Video Generation
Bin Lei
Le Chen
Caiwen Ding
VGen
20
1
0
30 Dec 2023
PanGu-$π$: Enhancing Language Model Architectures via Nonlinearity
  Compensation
PanGu-πππ: Enhancing Language Model Architectures via Nonlinearity Compensation
Yunhe Wang
Hanting Chen
Yehui Tang
Tianyu Guo
Kai Han
...
Qinghua Xu
Qun Liu
Jun Yao
Chao Xu
Dacheng Tao
59
15
0
27 Dec 2023
Gated Linear Attention Transformers with Hardware-Efficient Training
Gated Linear Attention Transformers with Hardware-Efficient Training
Songlin Yang
Bailin Wang
Yikang Shen
Rameswar Panda
Yoon Kim
40
138
0
11 Dec 2023
SpeechAct: Towards Generating Whole-body Motion from Speech
Jinsong Zhang
Minjie Zhu
Yuxiang Zhang
Yebin Liu
Kun Li
21
0
0
29 Nov 2023
Large Language Models Meet Computer Vision: A Brief Survey
Large Language Models Meet Computer Vision: A Brief Survey
Raby Hamadi
LM&MA
13
4
0
28 Nov 2023
StableSSM: Alleviating the Curse of Memory in State-space Models through
  Stable Reparameterization
StableSSM: Alleviating the Curse of Memory in State-space Models through Stable Reparameterization
Shida Wang
Qianxiao Li
17
12
0
24 Nov 2023
Alpha Zero for Physics: Application of Symbolic Regression with Alpha
  Zero to find the analytical methods in physics
Alpha Zero for Physics: Application of Symbolic Regression with Alpha Zero to find the analytical methods in physics
Yoshihiro Michishita
AI4CE
17
2
0
21 Nov 2023
To Transformers and Beyond: Large Language Models for the Genome
To Transformers and Beyond: Large Language Models for the Genome
Micaela Elisa Consens
Cameron Dufault
Michael Wainberg
Duncan Forster
Mehran Karimzadeh
Hani Goodarzi
Fabian J. Theis
Alan Moses
Bo Wang
LM&MA
MedIm
11
24
0
13 Nov 2023
GateLoop: Fully Data-Controlled Linear Recurrence for Sequence Modeling
GateLoop: Fully Data-Controlled Linear Recurrence for Sequence Modeling
Tobias Katsch
AI4TS
30
28
0
03 Nov 2023
ViR: Towards Efficient Vision Retention Backbones
ViR: Towards Efficient Vision Retention Backbones
Ali Hatamizadeh
Michael Ranzinger
Shiyi Lan
Jose M. Alvarez
Sanja Fidler
Jan Kautz
GNN
22
1
0
30 Oct 2023
Circuit as Set of Points
Circuit as Set of Points
Jialv Zou
Xinggang Wang
Jiahao Guo
Wenyu Liu
Qian Zhang
Chang Huang
GNN
3DV
3DPC
17
0
0
26 Oct 2023
BitNet: Scaling 1-bit Transformers for Large Language Models
BitNet: Scaling 1-bit Transformers for Large Language Models
Hongyu Wang
Shuming Ma
Li Dong
Shaohan Huang
Huaijie Wang
Lingxiao Ma
Fan Yang
Ruiping Wang
Yi Wu
Furu Wei
MQ
12
95
0
17 Oct 2023
Transport-Hub-Aware Spatial-Temporal Adaptive Graph Transformer for
  Traffic Flow Prediction
Transport-Hub-Aware Spatial-Temporal Adaptive Graph Transformer for Traffic Flow Prediction
Xiao Xu
Lei Zhang
Bailong Liu
Zhi Liang
Xuefei Zhang
AI4TS
19
1
0
12 Oct 2023
Is attention required for ICL? Exploring the Relationship Between Model
  Architecture and In-Context Learning Ability
Is attention required for ICL? Exploring the Relationship Between Model Architecture and In-Context Learning Ability
Ivan Lee
Nan Jiang
Taylor Berg-Kirkpatrick
26
12
0
12 Oct 2023
Exponential Quantum Communication Advantage in Distributed Inference and
  Learning
Exponential Quantum Communication Advantage in Distributed Inference and Learning
H. Michaeli
D. Gilboa
Daniel Soudry
Jarrod R. McClean
FedML
11
0
0
11 Oct 2023
LongLLMLingua: Accelerating and Enhancing LLMs in Long Context Scenarios
  via Prompt Compression
LongLLMLingua: Accelerating and Enhancing LLMs in Long Context Scenarios via Prompt Compression
Huiqiang Jiang
Qianhui Wu
Xufang Luo
Dongsheng Li
Chin-Yew Lin
Yuqing Yang
Lili Qiu
RALM
101
179
0
10 Oct 2023
RetSeg: Retention-based Colorectal Polyps Segmentation Network
RetSeg: Retention-based Colorectal Polyps Segmentation Network
Khaled Elkarazle
V. Raman
Caslon Chua
P. Then
MedIm
ViT
20
1
0
09 Oct 2023
USTEP: Spatio-Temporal Predictive Learning under A Unified View
USTEP: Spatio-Temporal Predictive Learning under A Unified View
Cheng Tan
Jue Wang
Zhangyang Gao
Siyuan Li
Stan Z. Li
36
1
0
09 Oct 2023
Understanding In-Context Learning in Transformers and LLMs by Learning
  to Learn Discrete Functions
Understanding In-Context Learning in Transformers and LLMs by Learning to Learn Discrete Functions
S. Bhattamishra
Arkil Patel
Phil Blunsom
Varun Kanade
19
40
0
04 Oct 2023
Towards Causal Foundation Model: on Duality between Causal Inference and
  Attention
Towards Causal Foundation Model: on Duality between Causal Inference and Attention
Jiaqi Zhang
Joel Jennings
Agrin Hilmkil
Nick Pawlowski
Cheng Zhang
Chao Ma
CML
41
13
0
01 Oct 2023
BAMBOO: A Comprehensive Benchmark for Evaluating Long Text Modeling
  Capacities of Large Language Models
BAMBOO: A Comprehensive Benchmark for Evaluating Long Text Modeling Capacities of Large Language Models
Zican Dong
Tianyi Tang
Junyi Li
Wayne Xin Zhao
Ji-Rong Wen
RALM
ALM
20
34
0
23 Sep 2023
RMT: Retentive Networks Meet Vision Transformers
RMT: Retentive Networks Meet Vision Transformers
Qihang Fan
Huaibo Huang
Mingrui Chen
Hongmin Liu
Ran He
ViT
30
73
0
20 Sep 2023
Folding Attention: Memory and Power Optimization for On-Device
  Transformer-based Streaming Speech Recognition
Folding Attention: Memory and Power Optimization for On-Device Transformer-based Streaming Speech Recognition
Yang Li
Liangzhen Lai
Shangguan Yuan
Forrest N. Iandola
Zhaoheng Ni
Ernie Chang
Yangyang Shi
Vikas Chandra
21
2
0
14 Sep 2023
Toward a Deeper Understanding: RetNet Viewed through Convolution
Toward a Deeper Understanding: RetNet Viewed through Convolution
Chenghao Li
Chaoning Zhang
ViT
24
7
0
11 Sep 2023
Previous
12345
Next