ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2305.13048
  4. Cited By
RWKV: Reinventing RNNs for the Transformer Era

RWKV: Reinventing RNNs for the Transformer Era

22 May 2023
Bo Peng
Eric Alcaide
Quentin G. Anthony
Alon Albalak
Samuel Arcadinho
Stella Biderman
Huanqi Cao
Xin Cheng
Michael Chung
Matteo Grella
G. Kranthikiran
Xuming He
Haowen Hou
Jiaju Lin
Przemyslaw Kazienko
Jan Kocoñ
Jiaming Kong
Bartlomiej Koptyra
Hayden Lau
Krishna Sri Ipsit Mantri
Ferdinand Mom
Atsushi Saito
Guangyu Song
Xiangru Tang
Bolun Wang
J. S. Wind
Stansilaw Wozniak
Ruichong Zhang
Zhenyuan Zhang
Qihang Zhao
P. Zhou
Qinghua Zhou
Jian Zhu
Rui-Jie Zhu
ArXivPDFHTML

Papers citing "RWKV: Reinventing RNNs for the Transformer Era"

50 / 388 papers shown
Title
Megalodon: Efficient LLM Pretraining and Inference with Unlimited
  Context Length
Megalodon: Efficient LLM Pretraining and Inference with Unlimited Context Length
Xuezhe Ma
Xiaomeng Yang
Wenhan Xiong
Beidi Chen
Lili Yu
Hao Zhang
Jonathan May
Luke Zettlemoyer
Omer Levy
Chunting Zhou
35
25
0
12 Apr 2024
HGRN2: Gated Linear RNNs with State Expansion
HGRN2: Gated Linear RNNs with State Expansion
Zhen Qin
Songlin Yang
Weixuan Sun
Xuyang Shen
Dong Li
Weigao Sun
Yiran Zhong
LRM
34
45
0
11 Apr 2024
From Words to Numbers: Your Large Language Model Is Secretly A Capable
  Regressor When Given In-Context Examples
From Words to Numbers: Your Large Language Model Is Secretly A Capable Regressor When Given In-Context Examples
Robert Vacareanu
Vlad-Andrei Negru
Vasile Suciu
Mihai Surdeanu
21
4
0
11 Apr 2024
Does Transformer Interpretability Transfer to RNNs?
Does Transformer Interpretability Transfer to RNNs?
Gonccalo Paulo
Thomas Marshall
Nora Belrose
25
6
0
09 Apr 2024
Eagle and Finch: RWKV with Matrix-Valued States and Dynamic Recurrence
Eagle and Finch: RWKV with Matrix-Valued States and Dynamic Recurrence
Bo Peng
Daniel Goldstein
Quentin G. Anthony
Alon Albalak
Eric Alcaide
...
Bingchen Zhao
Qihang Zhao
Peng Zhou
Jian Zhu
Ruijie Zhu
37
73
0
08 Apr 2024
Diffusion-RWKV: Scaling RWKV-Like Architectures for Diffusion Models
Diffusion-RWKV: Scaling RWKV-Like Architectures for Diffusion Models
Zhengcong Fei
Mingyuan Fan
Changqian Yu
Debang Li
Junshi Huang
32
23
0
06 Apr 2024
Linear Attention Sequence Parallelism
Linear Attention Sequence Parallelism
Weigao Sun
Zhen Qin
Dong Li
Xuyang Shen
Yu Qiao
Yiran Zhong
63
2
0
03 Apr 2024
Optimizing the Deployment of Tiny Transformers on Low-Power MCUs
Optimizing the Deployment of Tiny Transformers on Low-Power MCUs
Victor J. B. Jung
Alessio Burrello
Moritz Scherer
Francesco Conti
Luca Benini
17
4
0
03 Apr 2024
Long-context LLMs Struggle with Long In-context Learning
Long-context LLMs Struggle with Long In-context Learning
Tianle Li
Ge Zhang
Quy Duc Do
Xiang Yue
Wenhu Chen
36
152
0
02 Apr 2024
Transformer-Lite: High-efficiency Deployment of Large Language Models on
  Mobile Phone GPUs
Transformer-Lite: High-efficiency Deployment of Large Language Models on Mobile Phone GPUs
Luchang Li
Sheng Qian
Jie Lu
Lunxi Yuan
Rui Wang
Qin Xie
36
9
0
29 Mar 2024
DiJiang: Efficient Large Language Models through Compact Kernelization
DiJiang: Efficient Large Language Models through Compact Kernelization
Hanting Chen
Zhicheng Liu
Xutao Wang
Yuchuan Tian
Yunhe Wang
VLM
16
5
0
29 Mar 2024
MANGO: A Benchmark for Evaluating Mapping and Navigation Abilities of
  Large Language Models
MANGO: A Benchmark for Evaluating Mapping and Navigation Abilities of Large Language Models
Peng Ding
Jiading Fang
Peng Li
Kangrui Wang
Xiaochen Zhou
Mo Yu
Jing Li
Matthew R. Walter
Hongyuan Mei
RALM
ELM
32
3
0
29 Mar 2024
RankMamba: Benchmarking Mamba's Document Ranking Performance in the Era
  of Transformers
RankMamba: Benchmarking Mamba's Document Ranking Performance in the Era of Transformers
Zhichao Xu
19
12
0
27 Mar 2024
Mechanistic Design and Scaling of Hybrid Architectures
Mechanistic Design and Scaling of Hybrid Architectures
Michael Poli
Armin W. Thomas
Eric N. D. Nguyen
Pragaash Ponnusamy
Bjorn Deiseroth
...
Brian Hie
Stefano Ermon
Christopher Ré
Ce Zhang
Stefano Massaroli
MoE
49
19
0
26 Mar 2024
Onboard deep lossless and near-lossless predictive coding of
  hyperspectral images with line-based attention
Onboard deep lossless and near-lossless predictive coding of hyperspectral images with line-based attention
D. Valsesia
T. Bianchi
E. Magli
19
2
0
26 Mar 2024
Retentive Decision Transformer with Adaptive Masking for Reinforcement
  Learning based Recommendation Systems
Retentive Decision Transformer with Adaptive Masking for Reinforcement Learning based Recommendation Systems
Siyu Wang
Xiaocong Chen
Lina Yao
OffRL
13
1
0
26 Mar 2024
DGoT: Dynamic Graph of Thoughts for Scientific Abstract Generation
DGoT: Dynamic Graph of Thoughts for Scientific Abstract Generation
Xinyu Ning
Yutong Zhao
Yitong Liu
Hongwen Yang
VLM
20
1
0
26 Mar 2024
SiMBA: Simplified Mamba-Based Architecture for Vision and Multivariate
  Time series
SiMBA: Simplified Mamba-Based Architecture for Vision and Multivariate Time series
Badri N. Patro
Vijay Srinivas Agneeswaran
Mamba
44
50
0
22 Mar 2024
Comprehensive Reassessment of Large-Scale Evaluation Outcomes in LLMs: A
  Multifaceted Statistical Approach
Comprehensive Reassessment of Large-Scale Evaluation Outcomes in LLMs: A Multifaceted Statistical Approach
Kun Sun
Rong Wang
Anders Sogaard
24
3
0
22 Mar 2024
Hierarchical Skip Decoding for Efficient Autoregressive Text Generation
Hierarchical Skip Decoding for Efficient Autoregressive Text Generation
Yunqi Zhu
Xuebing Yang
Yuanyuan Wu
Wensheng Zhang
16
3
0
22 Mar 2024
Foundation Models for Time Series Analysis: A Tutorial and Survey
Foundation Models for Time Series Analysis: A Tutorial and Survey
Yuxuan Liang
Haomin Wen
Yuqi Nie
Yushan Jiang
Ming Jin
Dongjin Song
Shirui Pan
Qingsong Wen
AI4TS
AI4CE
81
108
0
21 Mar 2024
MELTing point: Mobile Evaluation of Language Transformers
MELTing point: Mobile Evaluation of Language Transformers
Stefanos Laskaridis
Kleomenis Katevas
Lorenzo Minto
Hamed Haddadi
19
4
0
19 Mar 2024
Geometric Constraints in Deep Learning Frameworks: A Survey
Geometric Constraints in Deep Learning Frameworks: A Survey
Vibhas Kumar Vats
David J. Crandall
3DV
25
0
0
19 Mar 2024
On the low-shot transferability of [V]-Mamba
On the low-shot transferability of [V]-Mamba
Diganta Misra
Jay Gala
Antonio Orvieto
Mamba
29
1
0
15 Mar 2024
Video Mamba Suite: State Space Model as a Versatile Alternative for
  Video Understanding
Video Mamba Suite: State Space Model as a Versatile Alternative for Video Understanding
Guo Chen
Yifei Huang
Jilan Xu
Baoqi Pei
Zhe Chen
Zhiqi Li
Jiahao Wang
Kunchang Li
Tong Lu
Limin Wang
Mamba
56
68
0
14 Mar 2024
LAN: Learning Adaptive Neighbors for Real-Time Insider Threat Detection
LAN: Learning Adaptive Neighbors for Real-Time Insider Threat Detection
Xiangrui Cai
Yang Wang
Sihan Xu
Hao Li
Ying Zhang
Zheli Liu
Xiaojie Yuan
25
1
0
14 Mar 2024
Language models scale reliably with over-training and on downstream
  tasks
Language models scale reliably with over-training and on downstream tasks
S. Gadre
Georgios Smyrnis
Vaishaal Shankar
Suchin Gururangan
Mitchell Wortsman
...
Y. Carmon
Achal Dave
Reinhard Heckel
Niklas Muennighoff
Ludwig Schmidt
ALM
ELM
LRM
91
40
0
13 Mar 2024
Rethinking Generative Large Language Model Evaluation for Semantic
  Comprehension
Rethinking Generative Large Language Model Evaluation for Semantic Comprehension
Fangyun Wei
Xi Chen
Linzi Luo
ELM
ALM
LRM
27
7
0
12 Mar 2024
VideoMamba: State Space Model for Efficient Video Understanding
VideoMamba: State Space Model for Efficient Video Understanding
Kunchang Li
Xinhao Li
Yi Wang
Yinan He
Yali Wang
Limin Wang
Yu Qiao
Mamba
27
174
0
11 Mar 2024
TrafficGPT: Breaking the Token Barrier for Efficient Long Traffic
  Analysis and Generation
TrafficGPT: Breaking the Token Barrier for Efficient Long Traffic Analysis and Generation
Jian Qu
Xiaobo Ma
Jianfeng Li
AI4TS
18
10
0
09 Mar 2024
ShortGPT: Layers in Large Language Models are More Redundant Than You
  Expect
ShortGPT: Layers in Large Language Models are More Redundant Than You Expect
Xin Men
Mingyu Xu
Qingyu Zhang
Bingning Wang
Hongyu Lin
Yaojie Lu
Xianpei Han
Weipeng Chen
23
102
0
06 Mar 2024
Vision-RWKV: Efficient and Scalable Visual Perception with RWKV-Like Architectures
Vision-RWKV: Efficient and Scalable Visual Perception with RWKV-Like Architectures
Yuchen Duan
Weiyun Wang
Zhe Chen
Xizhou Zhu
Lewei Lu
Tong Lu
Yu Qiao
Hongsheng Li
Jifeng Dai
Wenhai Wang
ViT
38
42
0
04 Mar 2024
The Hidden Attention of Mamba Models
The Hidden Attention of Mamba Models
Ameen Ali
Itamar Zimerman
Lior Wolf
Mamba
24
57
0
03 Mar 2024
Griffin: Mixing Gated Linear Recurrences with Local Attention for
  Efficient Language Models
Griffin: Mixing Gated Linear Recurrences with Local Attention for Efficient Language Models
Soham De
Samuel L. Smith
Anushan Fernando
Aleksandar Botev
George-Christian Muraru
...
David Budden
Yee Whye Teh
Razvan Pascanu
Nando de Freitas
Çağlar Gülçehre
Mamba
51
116
0
29 Feb 2024
Theoretical Foundations of Deep Selective State-Space Models
Theoretical Foundations of Deep Selective State-Space Models
Nicola Muca Cirone
Antonio Orvieto
Benjamin Walker
C. Salvi
Terry Lyons
Mamba
45
24
0
29 Feb 2024
DenseMamba: State Space Models with Dense Hidden Connection for
  Efficient Large Language Models
DenseMamba: State Space Models with Dense Hidden Connection for Efficient Large Language Models
Wei He
Kai Han
Yehui Tang
Chengcheng Wang
Yujie Yang
Tianyu Guo
Yunhe Wang
Mamba
48
25
0
26 Feb 2024
MegaScale: Scaling Large Language Model Training to More Than 10,000
  GPUs
MegaScale: Scaling Large Language Model Training to More Than 10,000 GPUs
Ziheng Jiang
Haibin Lin
Yinmin Zhong
Qi Huang
Yangrui Chen
...
Zhe Li
X. Jia
Jia-jun Ye
Xin Jin
Xin Liu
LRM
22
99
0
23 Feb 2024
MobileLLM: Optimizing Sub-billion Parameter Language Models for
  On-Device Use Cases
MobileLLM: Optimizing Sub-billion Parameter Language Models for On-Device Use Cases
Zechun Liu
Changsheng Zhao
Forrest N. Iandola
Chen Lai
Yuandong Tian
...
Ernie Chang
Yangyang Shi
Raghuraman Krishnamoorthi
Liangzhen Lai
Vikas Chandra
ALM
30
68
0
22 Feb 2024
$\infty$Bench: Extending Long Context Evaluation Beyond 100K Tokens
∞\infty∞Bench: Extending Long Context Evaluation Beyond 100K Tokens
Xinrong Zhang
Yingfa Chen
Shengding Hu
Zihang Xu
Junhao Chen
...
Xu Han
Zhen Leng Thai
Shuo Wang
Zhiyuan Liu
Maosong Sun
RALM
LRM
31
80
0
21 Feb 2024
Perceiving Longer Sequences With Bi-Directional Cross-Attention
  Transformers
Perceiving Longer Sequences With Bi-Directional Cross-Attention Transformers
Markus Hiller
Krista A. Ehinger
Tom Drummond
28
0
0
19 Feb 2024
SDiT: Spiking Diffusion Model with Transformer
SDiT: Spiking Diffusion Model with Transformer
Shu Yang
Hanzhi Ma
Chengting Yu
Aili Wang
Er-ping Li
DiffM
17
2
0
18 Feb 2024
In Search of Needles in a 11M Haystack: Recurrent Memory Finds What LLMs
  Miss
In Search of Needles in a 11M Haystack: Recurrent Memory Finds What LLMs Miss
Yuri Kuratov
Aydar Bulatov
Petr Anokhin
Dmitry Sorokin
Artyom Sorokin
Mikhail Burtsev
RALM
109
32
0
16 Feb 2024
Linear Transformers with Learnable Kernel Functions are Better
  In-Context Models
Linear Transformers with Learnable Kernel Functions are Better In-Context Models
Yaroslav Aksenov
Nikita Balagansky
Sofia Maria Lo Cicero Vaina
Boris Shaposhnikov
Alexey Gorbatovski
Daniil Gavrilov
KELM
20
5
0
16 Feb 2024
Get More with LESS: Synthesizing Recurrence with KV Cache Compression
  for Efficient LLM Inference
Get More with LESS: Synthesizing Recurrence with KV Cache Compression for Efficient LLM Inference
Harry Dong
Xinyu Yang
Zhenyu (Allen) Zhang
Zhangyang Wang
Yuejie Chi
Beidi Chen
12
47
0
14 Feb 2024
Stochastic Spiking Attention: Accelerating Attention with Stochastic
  Computing in Spiking Networks
Stochastic Spiking Attention: Accelerating Attention with Stochastic Computing in Spiking Networks
Zihang Song
Prabodh Katti
Osvaldo Simeone
Bipin Rajendran
8
2
0
14 Feb 2024
On the Resurgence of Recurrent Models for Long Sequences -- Survey and
  Research Opportunities in the Transformer Era
On the Resurgence of Recurrent Models for Long Sequences -- Survey and Research Opportunities in the Transformer Era
Matteo Tiezzi
Michele Casoni
Alessandro Betti
Tommaso Guidi
Marco Gori
S. Melacci
16
9
0
12 Feb 2024
Large Language Models: A Survey
Large Language Models: A Survey
Shervin Minaee
Tomáš Mikolov
Narjes Nikzad
M. Asgari-Chenaghlu
R. Socher
Xavier Amatriain
Jianfeng Gao
ALM
LM&MA
ELM
107
347
0
09 Feb 2024
PaDeLLM-NER: Parallel Decoding in Large Language Models for Named Entity
  Recognition
PaDeLLM-NER: Parallel Decoding in Large Language Models for Named Entity Recognition
Jinghui Lu
Ziwei Yang
Yanjie Wang
Xuejing Liu
Brian Mac Namee
Can Huang
MoE
37
4
0
07 Feb 2024
Can Mamba Learn How to Learn? A Comparative Study on In-Context Learning
  Tasks
Can Mamba Learn How to Learn? A Comparative Study on In-Context Learning Tasks
Jongho Park
Jaeseung Park
Zheyang Xiong
Nayoung Lee
Jaewoong Cho
Samet Oymak
Kangwook Lee
Dimitris Papailiopoulos
11
31
0
06 Feb 2024
ReLU$^2$ Wins: Discovering Efficient Activation Functions for Sparse
  LLMs
ReLU2^22 Wins: Discovering Efficient Activation Functions for Sparse LLMs
Zhengyan Zhang
Yixin Song
Guanghui Yu
Xu Han
Yankai Lin
Chaojun Xiao
Chenyang Song
Zhiyuan Liu
Zeyu Mi
Maosong Sun
15
31
0
06 Feb 2024
Previous
12345678
Next