ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2305.13048
  4. Cited By
RWKV: Reinventing RNNs for the Transformer Era

RWKV: Reinventing RNNs for the Transformer Era

22 May 2023
Bo Peng
Eric Alcaide
Quentin G. Anthony
Alon Albalak
Samuel Arcadinho
Stella Biderman
Huanqi Cao
Xin Cheng
Michael Chung
Matteo Grella
G. Kranthikiran
Xuming He
Haowen Hou
Jiaju Lin
Przemyslaw Kazienko
Jan Kocoñ
Jiaming Kong
Bartlomiej Koptyra
Hayden Lau
Krishna Sri Ipsit Mantri
Ferdinand Mom
Atsushi Saito
Guangyu Song
Xiangru Tang
Bolun Wang
J. S. Wind
Stansilaw Wozniak
Ruichong Zhang
Zhenyuan Zhang
Qihang Zhao
P. Zhou
Qinghua Zhou
Jian Zhu
Rui-Jie Zhu
ArXivPDFHTML

Papers citing "RWKV: Reinventing RNNs for the Transformer Era"

50 / 388 papers shown
Title
Activation Space Selectable Kolmogorov-Arnold Networks
Activation Space Selectable Kolmogorov-Arnold Networks
Zhuoqin Yang
Jiansong Zhang
Xiaoling Luo
Zheng Lu
Linlin Shen
29
6
0
15 Aug 2024
Treat Stillness with Movement: Remote Sensing Change Detection via
  Coarse-grained Temporal Foregrounds Mining
Treat Stillness with Movement: Remote Sensing Change Detection via Coarse-grained Temporal Foregrounds Mining
Xixi Wang
Zitian Wang
Jingtao Jiang
Lan Chen
Xiao Wang
Bo Jiang
VGen
17
0
0
15 Aug 2024
Scaling Deep Learning Computation over the Inter-Core Connected
  Intelligence Processor with T10
Scaling Deep Learning Computation over the Inter-Core Connected Intelligence Processor with T10
Yiqi Liu
Yuqi Xue
Yu Cheng
Lingxiao Ma
Ziming Miao
Jilong Xue
Jian Huang
GNN
16
1
0
09 Aug 2024
DyGMamba: Efficiently Modeling Long-Term Temporal Dependency on Continuous-Time Dynamic Graphs with State Space Models
DyGMamba: Efficiently Modeling Long-Term Temporal Dependency on Continuous-Time Dynamic Graphs with State Space Models
Zifeng Ding
Yifeng Li
Yuan He
Antonio Norelli
Jingcheng Wu
Volker Tresp
Yunpu Ma
Michael Bronstein
Mamba
29
3
0
08 Aug 2024
Logistic Regression makes small LLMs strong and explainable
  "tens-of-shot" classifiers
Logistic Regression makes small LLMs strong and explainable "tens-of-shot" classifiers
Marcus Buckmann
Edward Hill
16
1
0
06 Aug 2024
Why Perturbing Symbolic Music is Necessary: Fitting the Distribution of
  Never-used Notes through a Joint Probabilistic Diffusion Model
Why Perturbing Symbolic Music is Necessary: Fitting the Distribution of Never-used Notes through a Joint Probabilistic Diffusion Model
Shipei Liu
Xiaoya Fan
Guowei Wu
DiffM
16
1
0
04 Aug 2024
What comes after transformers? -- A selective survey connecting ideas in
  deep learning
What comes after transformers? -- A selective survey connecting ideas in deep learning
Johannes Schneider
AI4CE
27
2
0
01 Aug 2024
Autogenic Language Embedding for Coherent Point Tracking
Autogenic Language Embedding for Coherent Point Tracking
Zikai Song
Ying Tang
Run Luo
Lintao Ma
Junqing Yu
Yi-Ping Phoebe Chen
Wei Yang
39
3
0
30 Jul 2024
LION: Linear Group RNN for 3D Object Detection in Point Clouds
LION: Linear Group RNN for 3D Object Detection in Point Clouds
Zhe Liu
Jinghua Hou
Xinyu Wang
Xiaoqing Ye
Jingdong Wang
Hengshuang Zhao
Xiang Bai
3DPC
45
11
0
25 Jul 2024
Keep the Cost Down: A Review on Methods to Optimize LLM' s KV-Cache
  Consumption
Keep the Cost Down: A Review on Methods to Optimize LLM' s KV-Cache Consumption
Shi Luohe
Hongyi Zhang
Yao Yao
Z. Li
Zhao Hai
31
17
0
25 Jul 2024
Enhancing Model Performance: Another Approach to Vision-Language
  Instruction Tuning
Enhancing Model Performance: Another Approach to Vision-Language Instruction Tuning
Vedanshu
M. M. Tripathi
Bhavnesh Jaint
MLLM
VLM
22
0
0
25 Jul 2024
Banyan: Improved Representation Learning with Explicit Structure
Banyan: Improved Representation Learning with Explicit Structure
Mattia Opper
N. Siddharth
23
1
0
25 Jul 2024
Enhancing Environmental Monitoring through Multispectral Imaging: The
  WasteMS Dataset for Semantic Segmentation of Lakeside Waste
Enhancing Environmental Monitoring through Multispectral Imaging: The WasteMS Dataset for Semantic Segmentation of Lakeside Waste
Qinfeng Zhu
Ningxin Weng
Lei Fan
Yuanzhi Cai
31
0
0
24 Jul 2024
Attention Is All You Need But You Don't Need All Of It For Inference of
  Large Language Models
Attention Is All You Need But You Don't Need All Of It For Inference of Large Language Models
Georgy Tyukin
G. Dovonon
Jean Kaddour
Pasquale Minervini
LRM
23
0
0
22 Jul 2024
RazorAttention: Efficient KV Cache Compression Through Retrieval Heads
RazorAttention: Efficient KV Cache Compression Through Retrieval Heads
Hanlin Tang
Yang Lin
Jing Lin
Qingsen Han
Shikuan Hong
Yiwu Yao
Gongyi Wang
MQ
29
26
0
22 Jul 2024
Longhorn: State Space Models are Amortized Online Learners
Longhorn: State Space Models are Amortized Online Learners
Bo Liu
Rui Wang
Lemeng Wu
Yihao Feng
Peter Stone
Qian Liu
46
10
0
19 Jul 2024
Mamba-PTQ: Outlier Channels in Recurrent Large Language Models
Mamba-PTQ: Outlier Channels in Recurrent Large Language Models
Alessandro Pierro
Steven Abreu
MQ
Mamba
33
6
0
17 Jul 2024
GoldFinch: High Performance RWKV/Transformer Hybrid with Linear Pre-Fill
  and Extreme KV-Cache Compression
GoldFinch: High Performance RWKV/Transformer Hybrid with Linear Pre-Fill and Extreme KV-Cache Compression
Daniel Goldstein
Fares Obeid
Eric Alcaide
Guangyu Song
Eugene Cheah
VLM
AI4TS
24
7
0
16 Jul 2024
Restore-RWKV: Efficient and Effective Medical Image Restoration with RWKV
Restore-RWKV: Efficient and Effective Medical Image Restoration with RWKV
Zhiwen Yang
Hui Zhang
Dan Zhao
Bingzheng Wei
Bingzheng Wei
Yan Xu
MedIm
41
10
0
14 Jul 2024
Low-Rank Interconnected Adaptation Across Layers
Low-Rank Interconnected Adaptation Across Layers
Yibo Zhong
Yao Zhou
OffRL
MoE
36
1
0
13 Jul 2024
Sensorimotor Attention and Language-based Regressions in Shared Latent
  Variables for Integrating Robot Motion Learning and LLM
Sensorimotor Attention and Language-based Regressions in Shared Latent Variables for Integrating Robot Motion Learning and LLM
Kanata Suzuki
Tetsuya Ogata
24
2
0
12 Jul 2024
FlashAttention-3: Fast and Accurate Attention with Asynchrony and
  Low-precision
FlashAttention-3: Fast and Accurate Attention with Asynchrony and Low-precision
Jay Shah
Ganesh Bikshandi
Ying Zhang
Vijay Thakkar
Pradeep Ramani
Tri Dao
32
112
0
11 Jul 2024
How Well Can a Long Sequence Model Model Long Sequences? Comparing
  Architechtural Inductive Biases on Long-Context Abilities
How Well Can a Long Sequence Model Model Long Sequences? Comparing Architechtural Inductive Biases on Long-Context Abilities
Jerry Huang
52
7
0
11 Jul 2024
Mamba-FSCIL: Dynamic Adaptation with Selective State Space Model for
  Few-Shot Class-Incremental Learning
Mamba-FSCIL: Dynamic Adaptation with Selective State Space Model for Few-Shot Class-Incremental Learning
Xiaojie Li
Yibo Yang
Jianlong Wu
Bernard Ghanem
Liqiang Nie
Min Zhang
Mamba
36
5
0
08 Jul 2024
How Effective are State Space Models for Machine Translation?
How Effective are State Space Models for Machine Translation?
Hugo Pitorro
Pavlo Vasylenko
Marcos Vinícius Treviso
André F. T. Martins
Mamba
30
2
0
07 Jul 2024
Mamba Hawkes Process
Mamba Hawkes Process
Anningzhe Gao
Shan Dai
Yan Hu
Mamba
21
1
0
07 Jul 2024
Learning to (Learn at Test Time): RNNs with Expressive Hidden States
Learning to (Learn at Test Time): RNNs with Expressive Hidden States
Yu Sun
Xinhao Li
Karan Dalal
Jiarui Xu
Arjun Vikram
...
Xinlei Chen
Xiaolong Wang
Sanmi Koyejo
Tatsunori Hashimoto
Carlos Guestrin
45
89
0
05 Jul 2024
MInference 1.0: Accelerating Pre-filling for Long-Context LLMs via
  Dynamic Sparse Attention
MInference 1.0: Accelerating Pre-filling for Long-Context LLMs via Dynamic Sparse Attention
Huiqiang Jiang
Yucheng Li
Chengruidong Zhang
Qianhui Wu
Xufang Luo
...
Amir H. Abdi
Dongsheng Li
Chin-Yew Lin
Yuqing Yang
L. Qiu
61
1
0
02 Jul 2024
Diffusion Forcing: Next-token Prediction Meets Full-Sequence Diffusion
Diffusion Forcing: Next-token Prediction Meets Full-Sequence Diffusion
Boyuan Chen
Diego Marti Monso
Yilun Du
Max Simchowitz
Russ Tedrake
Vincent Sitzmann
DiffM
14
72
0
01 Jul 2024
Mamba or RWKV: Exploring High-Quality and High-Efficiency Segment
  Anything Model
Mamba or RWKV: Exploring High-Quality and High-Efficiency Segment Anything Model
Haobo Yuan
Xiangtai Li
Lu Qi
Tao Zhang
Ming Yang
Shuicheng Yan
Chen Change Loy
VLM
27
10
0
27 Jun 2024
From Efficient Multimodal Models to World Models: A Survey
From Efficient Multimodal Models to World Models: A Survey
Xinji Mai
Zeng Tao
Junxiong Lin
Haoran Wang
Yang Chang
Yanlan Kang
Yan Wang
Wenqiang Zhang
26
5
0
27 Jun 2024
Scalable Artificial Intelligence for Science: Perspectives, Methods and
  Exemplars
Scalable Artificial Intelligence for Science: Perspectives, Methods and Exemplars
Wesley Brewer
Aditya Kashi
Sajal Dash
A. Tsaris
Junqi Yin
Mallikarjun Shankar
Feiyi Wang
25
0
0
24 Jun 2024
From Decoding to Meta-Generation: Inference-time Algorithms for Large
  Language Models
From Decoding to Meta-Generation: Inference-time Algorithms for Large Language Models
Sean Welleck
Amanda Bertsch
Matthew Finlayson
Hailey Schoelkopf
Alex Xie
Graham Neubig
Ilia Kulikov
Zaid Harchaoui
33
45
0
24 Jun 2024
Sparser is Faster and Less is More: Efficient Sparse Attention for
  Long-Range Transformers
Sparser is Faster and Less is More: Efficient Sparse Attention for Long-Range Transformers
Chao Lou
Zixia Jia
Zilong Zheng
Kewei Tu
ODL
19
18
0
24 Jun 2024
Building on Efficient Foundations: Effectively Training LLMs with
  Structured Feedforward Layers
Building on Efficient Foundations: Effectively Training LLMs with Structured Feedforward Layers
Xiuying Wei
Skander Moalla
Razvan Pascanu
Çağlar Gülçehre
14
0
0
24 Jun 2024
Mooncake: A KVCache-centric Disaggregated Architecture for LLM Serving
Mooncake: A KVCache-centric Disaggregated Architecture for LLM Serving
Ruoyu Qin
Zheming Li
Weiran He
Mingxing Zhang
Yongwei Wu
Weimin Zheng
Xinran Xu
29
51
0
24 Jun 2024
MoA: Mixture of Sparse Attention for Automatic Large Language Model
  Compression
MoA: Mixture of Sparse Attention for Automatic Large Language Model Compression
Tianyu Fu
Haofeng Huang
Xuefei Ning
Genghan Zhang
Boju Chen
...
Shiyao Li
Shengen Yan
Guohao Dai
Huazhong Yang
Yu Wang
MQ
31
2
0
21 Jun 2024
DeciMamba: Exploring the Length Extrapolation Potential of Mamba
DeciMamba: Exploring the Length Extrapolation Potential of Mamba
Assaf Ben-Kish
Itamar Zimerman
Shady Abu Hussein
Nadav Cohen
Amir Globerson
Lior Wolf
Raja Giryes
Mamba
58
12
0
20 Jun 2024
VisualRWKV: Exploring Recurrent Neural Networks for Visual Language
  Models
VisualRWKV: Exploring Recurrent Neural Networks for Visual Language Models
Haowen Hou
Peigen Zeng
Fei Ma
Fei Richard Yu
VLM
24
0
0
19 Jun 2024
CherryRec: Enhancing News Recommendation Quality via LLM-driven
  Framework
CherryRec: Enhancing News Recommendation Quality via LLM-driven Framework
Shaohuang Wang
Lun Wang
Yunhan Bu
Tianwei Huang
30
2
0
18 Jun 2024
MCSD: An Efficient Language Model with Diverse Fusion
MCSD: An Efficient Language Model with Diverse Fusion
Hua Yang
Duohai Li
Shiman Li
19
2
0
18 Jun 2024
A Scalable and Effective Alternative to Graph Transformers
A Scalable and Effective Alternative to Graph Transformers
Kaan Sancak
Zhigang Hua
Jin Fang
Yan Xie
Andrey Malevich
Bo Long
M. F. Balin
Ümit V. Çatalyürek
35
1
0
17 Jun 2024
CItruS: Chunked Instruction-aware State Eviction for Long Sequence
  Modeling
CItruS: Chunked Instruction-aware State Eviction for Long Sequence Modeling
Yu Bai
Xiyuan Zou
Heyan Huang
Sanxing Chen
Marc-Antoine Rondeau
Yang Gao
Jackie Chi Kit Cheung
21
3
0
17 Jun 2024
Promises, Outlooks and Challenges of Diffusion Language Modeling
Promises, Outlooks and Challenges of Diffusion Language Modeling
Justin Deschenaux
Çağlar Gülçehre
DiffM
33
2
0
17 Jun 2024
SampleAttention: Near-Lossless Acceleration of Long Context LLM
  Inference with Adaptive Structured Sparse Attention
SampleAttention: Near-Lossless Acceleration of Long Context LLM Inference with Adaptive Structured Sparse Attention
Qianchao Zhu
Jiangfei Duan
Chang Chen
Siran Liu
Xiuhong Li
...
Huanqi Cao
Xiao Chuanfu
Xingcheng Zhang
Dahua Lin
Chao Yang
25
15
0
17 Jun 2024
BABILong: Testing the Limits of LLMs with Long Context
  Reasoning-in-a-Haystack
BABILong: Testing the Limits of LLMs with Long Context Reasoning-in-a-Haystack
Yuri Kuratov
Aydar Bulatov
Petr Anokhin
Ivan Rodkin
Dmitry Sorokin
Artyom Sorokin
Mikhail Burtsev
RALM
ALM
LRM
ReLM
ELM
39
57
0
14 Jun 2024
Separations in the Representational Capabilities of Transformers and
  Recurrent Architectures
Separations in the Representational Capabilities of Transformers and Recurrent Architectures
S. Bhattamishra
Michael Hahn
Phil Blunsom
Varun Kanade
GNN
24
8
0
13 Jun 2024
Cognitively Inspired Energy-Based World Models
Cognitively Inspired Energy-Based World Models
Alexi Gladstone
Ganesh Nanduru
Md. Mofijul Islam
Aman Chadha
Jundong Li
Tariq Iqbal
18
0
0
13 Jun 2024
Autoregressive Pretraining with Mamba in Vision
Autoregressive Pretraining with Mamba in Vision
Sucheng Ren
Xianhang Li
Haoqin Tu
Feng Wang
Fangxun Shu
...
L. Yang
Peng Wang
Heng Wang
Alan Yuille
Cihang Xie
Mamba
51
9
0
11 Jun 2024
MambaLRP: Explaining Selective State Space Sequence Models
MambaLRP: Explaining Selective State Space Sequence Models
F. Jafari
G. Montavon
Klaus-Robert Müller
Oliver Eberle
Mamba
43
9
0
11 Jun 2024
Previous
12345678
Next