Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2307.08621
Cited By
Retentive Network: A Successor to Transformer for Large Language Models
17 July 2023
Yutao Sun
Li Dong
Shaohan Huang
Shuming Ma
Yuqing Xia
Jilong Xue
Jianyong Wang
Furu Wei
LRM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Retentive Network: A Successor to Transformer for Large Language Models"
50 / 207 papers shown
Title
Gated Attention for Large Language Models: Non-linearity, Sparsity, and Attention-Sink-Free
Z. Qiu
Z. Wang
Bo Zheng
Zeyu Huang
Kaiyue Wen
...
Fei Huang
Suozhi Huang
Dayiheng Liu
Jingren Zhou
Junyang Lin
MoE
18
0
0
10 May 2025
LLM Security: Vulnerabilities, Attacks, Defenses, and Countermeasures
Francisco Aguilera-Martínez
Fernando Berzal
PILM
48
0
0
02 May 2025
Graph Fourier Transformer with Structure-Frequency Information
Yonghui Zhai
Yang Zhang
Minghao Shang
Lihua Pang
Yaxin Ren
31
0
0
28 Apr 2025
WuNeng: Hybrid State with Attention
Liu Xiao
Li Zhiyuan
Lin Yueyu
41
0
0
27 Apr 2025
Understanding the Skill Gap in Recurrent Language Models: The Role of the Gather-and-Aggregate Mechanism
Aviv Bick
Eric P. Xing
Albert Gu
RALM
81
0
0
22 Apr 2025
It's All Connected: A Journey Through Test-Time Memorization, Attentional Bias, Retention, and Online Optimization
Ali Behrouz
Meisam Razaviyayn
Peilin Zhong
Vahab Mirrokni
36
0
0
17 Apr 2025
Hadamard product in deep learning: Introduction, Advances and Challenges
Grigorios G. Chrysos
Yongtao Wu
Razvan Pascanu
Philip Torr
V. Cevher
AAML
96
0
0
17 Apr 2025
Millions of States: Designing a Scalable MoE Architecture with RWKV-7 Meta-learner
Liu Xiao
Li Zhiyuan
Lin Yueyu
28
0
0
11 Apr 2025
Compound and Parallel Modes of Tropical Convolutional Neural Networks
Mingbo Li
Liying Liu
Ye Luo
33
0
0
09 Apr 2025
DFormerv2: Geometry Self-Attention for RGBD Semantic Segmentation
Bo Yin
Jiao-Long Cao
Ming-Ming Cheng
Qibin Hou
3DPC
MDE
48
0
0
07 Apr 2025
Gating is Weighting: Understanding Gated Linear Attention through In-context Learning
Yingcong Li
Davoud Ataee Tarzanagh
A. S. Rawat
Maryam Fazel
Samet Oymak
23
0
0
06 Apr 2025
ParallelFlow: Parallelizing Linear Transformers via Flow Discretization
Nicola Muca Cirone
C. Salvi
41
1
0
01 Apr 2025
ViT-Linearizer: Distilling Quadratic Knowledge into Linear-Time Vision Models
Guoyizhe Wei
Rama Chellappa
31
0
0
30 Mar 2025
RSRWKV: A Linear-Complexity 2D Attention Mechanism for Efficient Remote Sensing Vision Task
Chunshan Li
Rong Wang
Xiaofei Yang
Dianhui Chu
72
0
0
26 Mar 2025
Bokehlicious: Photorealistic Bokeh Rendering with Controllable Apertures
Tim Seizinger
Florin-Alexandru Vasluianu
Marcos V. Conde
Zongwei Wu
Radu Timofte
44
0
0
20 Mar 2025
SpeCache: Speculative Key-Value Caching for Efficient Generation of LLMs
Shibo Jie
Yehui Tang
Kai Han
Zhi-Hong Deng
Jing Han
89
0
0
20 Mar 2025
Tiled Flash Linear Attention: More Efficient Linear RNN and xLSTM Kernels
M. Beck
Korbinian Poppel
Phillip Lippe
Sepp Hochreiter
59
1
0
18 Mar 2025
MaTVLM: Hybrid Mamba-Transformer for Efficient Vision-Language Modeling
Yingyue Li
Bencheng Liao
Wenyu Liu
Xinggang Wang
Mamba
58
0
0
17 Mar 2025
xLSTM 7B: A Recurrent LLM for Fast and Efficient Inference
M. Beck
Korbinian Poppel
Phillip Lippe
Richard Kurle
P. Blies
G. Klambauer
Sebastian Böck
Sepp Hochreiter
LRM
40
0
0
17 Mar 2025
Atlas: Multi-Scale Attention Improves Long Context Image Modeling
Kumar Krishna Agrawal
Long Lian
L. Liu
Natalia Harguindeguy
Boyi Li
Alexander Bick
Maggie Chung
Trevor Darrell
Adam Yala
ViT
50
0
0
16 Mar 2025
Autoregressive Image Generation with Randomized Parallel Decoding
Haopeng Li
Jinyue Yang
Guoqi Li
Huan Wang
53
0
0
13 Mar 2025
Fixed-Point RNNs: From Diagonal to Dense in a Few Iterations
Sajad Movahedi
Felix Sarnthein
Nicola Muca Cirone
Antonio Orvieto
46
2
0
13 Mar 2025
Robustness Tokens: Towards Adversarial Robustness of Transformers
Brian Pulfer
Yury Belousov
S. Voloshynovskiy
AAML
37
0
0
13 Mar 2025
BioMoDiffuse: Physics-Guided Biomechanical Diffusion for Controllable and Authentic Human Motion Synthesis
Zixi Kang
Xinghan Wang
Yadong Mu
VGen
60
0
0
08 Mar 2025
Conformal Transformations for Symmetric Power Transformers
Saurabh Kumar
Jacob Buckman
Carles Gelada
Sean Zhang
65
0
0
05 Mar 2025
Liger: Linearizing Large Language Models to Gated Recurrent Structures
Disen Lan
Weigao Sun
Jiaxi Hu
Jusen Du
Yu-Xi Cheng
64
0
0
03 Mar 2025
ProDapt: Proprioceptive Adaptation using Long-term Memory Diffusion
Federico Pizarro Bejarano
Bryson Jones
Daniel Pastor Moreno
J. Bowkett
Paul Backes
Angela P. Schoellig
31
0
0
28 Feb 2025
FlexPrefill: A Context-Aware Sparse Attention Mechanism for Efficient Long-Sequence Inference
Xunhao Lai
Jianqiao Lu
Yao Luo
Yiyuan Ma
Xun Zhou
63
5
0
28 Feb 2025
AttentionEngine: A Versatile Framework for Efficient Attention Mechanisms on Diverse Hardware Platforms
Feiyang Chen
Yu Cheng
Lei Wang
Yuqing Xia
Ziming Miao
...
Fan Yang
J. Xue
Zhi Yang
M. Yang
H. Chen
71
1
0
24 Feb 2025
Associative Recurrent Memory Transformer
Ivan Rodkin
Yuri Kuratov
Aydar Bulatov
Mikhail Burtsev
65
2
0
17 Feb 2025
State-space models are accurate and efficient neural operators for dynamical systems
Zheyuan Hu
Nazanin Ahmadi Daryakenari
Qianli Shen
Kenji Kawaguchi
George Karniadakis
Mamba
AI4CE
64
10
0
28 Jan 2025
Tensor Product Attention Is All You Need
Yifan Zhang
Yifeng Liu
Huizhuo Yuan
Zhen Qin
Yang Yuan
Q. Gu
Andrew Chi-Chih Yao
72
9
0
11 Jan 2025
VMamba: Visual State Space Model
Yue Liu
Yunjie Tian
Yuzhong Zhao
Hongtian Yu
Lingxi Xie
Yaowei Wang
Qixiang Ye
Jianbin Jiao
Yunfan Liu
Mamba
106
592
0
31 Dec 2024
Expansion Span: Combining Fading Memory and Retrieval in Hybrid State Space Models
Elvis Nunez
L. Zancato
Benjamin Bowman
Aditya Golatkar
W. Xia
Stefano Soatto
73
2
0
17 Dec 2024
MAL: Cluster-Masked and Multi-Task Pretraining for Enhanced xLSTM Vision Performance
Wenjun Huang
Jianguo Hu
79
0
0
14 Dec 2024
Streaming Detection of Queried Event Start
Cristobal Eyzaguirre
Eric Tang
S. Buch
Adrien Gaidon
Jiajun Wu
Juan Carlos Niebles
69
0
0
04 Dec 2024
Marconi: Prefix Caching for the Era of Hybrid LLMs
Rui Pan
Zhuang Wang
Zhen Jia
Can Karakus
Luca Zancato
Tri Dao
Ravi Netravali
Yida Wang
87
4
0
28 Nov 2024
CARE Transformer: Mobile-Friendly Linear Visual Transformer via Decoupled Dual Interaction
Yuan Zhou
Qingshan Xu
Jiequan Cui
Junbao Zhou
Jing Zhang
Richang Hong
H. Zhang
ViT
73
0
0
25 Nov 2024
Financial Risk Assessment via Long-term Payment Behavior Sequence Folding
Yiran Qiao
Yateng Tang
Xiang Ao
Qi Yuan
Ziming Liu
Chen Shen
Xuehao Zheng
62
0
0
22 Nov 2024
Hymba: A Hybrid-head Architecture for Small Language Models
Xin Dong
Y. Fu
Shizhe Diao
Wonmin Byeon
Zijia Chen
...
Min-Hung Chen
Yoshi Suhara
Y. Lin
Jan Kautz
Pavlo Molchanov
Mamba
97
21
0
20 Nov 2024
MetaLA: Unified Optimal Linear Approximation to Softmax Attention Map
Yuhong Chou
Man Yao
Kexin Wang
Yuqi Pan
Ruijie Zhu
Yiran Zhong
Yu Qiao
J. Wu
Bo Xu
Guoqi Li
41
4
0
16 Nov 2024
Retentive Neural Quantum States: Efficient Ansätze for Ab Initio Quantum Chemistry
Oliver Knitter
Dan Zhao
J. Stokes
M. Ganahl
Stefan Leichenauer
S. Veerapaneni
37
1
0
06 Nov 2024
MambaPEFT: Exploring Parameter-Efficient Fine-Tuning for Mamba
Masakazu Yoshimura
Teruaki Hayashi
Yota Maeda
Mamba
66
2
0
06 Nov 2024
LiVOS: Light Video Object Segmentation with Gated Linear Matching
Qin Liu
Jianfeng Wang
Z. Yang
Linjie Li
Kevin Qinghong Lin
Marc Niethammer
Lijuan Wang
VOS
42
1
0
05 Nov 2024
DeeR-VLA: Dynamic Inference of Multimodal Large Language Models for Efficient Robot Execution
Yang Yue
Yulin Wang
Bingyi Kang
Yizeng Han
Shenzhi Wang
Shiji Song
Jiashi Feng
Gao Huang
VLM
38
16
0
04 Nov 2024
Lina-Speech: Gated Linear Attention is a Fast and Parameter-Efficient Learner for text-to-speech synthesis
Théodor Lemerle
Harrison Vanderbyl
Vaibhav Srivastav
Nicolas Obin
Axel Roebel
31
1
0
30 Oct 2024
Taipan: Efficient and Expressive State Space Language Models with Selective Attention
Chien Van Nguyen
Huy Huu Nguyen
Thang M. Pham
Ruiyi Zhang
Hanieh Deilamsalehy
...
Ryan A. Rossi
Trung Bui
Viet Dac Lai
Franck Dernoncourt
Thien Huu Nguyen
Mamba
RALM
29
1
0
24 Oct 2024
MiniPLM: Knowledge Distillation for Pre-Training Language Models
Yuxian Gu
Hao Zhou
Fandong Meng
Jie Zhou
Minlie Huang
62
5
0
22 Oct 2024
Making Every Frame Matter: Continuous Activity Recognition in Streaming Video via Adaptive Video Context Modeling
Hao Wu
Donglin Bai
Shiqi Jiang
Qianxi Zhang
Y. Yang
Ting Cao
Fengyuan Xu
Yunxin Liu
Fengyuan Xu
42
0
0
19 Oct 2024
Rethinking Transformer for Long Contextual Histopathology Whole Slide Image Analysis
Honglin Li
Yunlong Zhang
Pingyi Chen
Zhongyi Shui
Chenglu Zhu
Lin Yang
MedIm
32
4
0
18 Oct 2024
1
2
3
4
5
Next