ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2305.13048
  4. Cited By
RWKV: Reinventing RNNs for the Transformer Era

RWKV: Reinventing RNNs for the Transformer Era

22 May 2023
Bo Peng
Eric Alcaide
Quentin G. Anthony
Alon Albalak
Samuel Arcadinho
Stella Biderman
Huanqi Cao
Xin Cheng
Michael Chung
Matteo Grella
G. Kranthikiran
Xuming He
Haowen Hou
Jiaju Lin
Przemyslaw Kazienko
Jan Kocoñ
Jiaming Kong
Bartlomiej Koptyra
Hayden Lau
Krishna Sri Ipsit Mantri
Ferdinand Mom
Atsushi Saito
Guangyu Song
Xiangru Tang
Bolun Wang
J. S. Wind
Stansilaw Wozniak
Ruichong Zhang
Zhenyuan Zhang
Qihang Zhao
P. Zhou
Qinghua Zhou
Jian Zhu
Rui-Jie Zhu
ArXivPDFHTML

Papers citing "RWKV: Reinventing RNNs for the Transformer Era"

50 / 388 papers shown
Title
Mixture of Parrots: Experts improve memorization more than reasoning
Mixture of Parrots: Experts improve memorization more than reasoning
Samy Jelassi
Clara Mohri
David Brandfonbrener
Alex Gu
Nikhil Vyas
Nikhil Anand
David Alvarez-Melis
Yuanzhi Li
Sham Kakade
Eran Malach
MoE
18
3
0
24 Oct 2024
Frontiers in Intelligent Colonoscopy
Frontiers in Intelligent Colonoscopy
Ge-Peng Ji
Jingyi Liu
Peng-Tao Xu
Nick Barnes
F. Khan
Salman Khan
Deng-Ping Fan
41
4
0
22 Oct 2024
Do Robot Snakes Dream like Electric Sheep? Investigating the Effects of Architectural Inductive Biases on Hallucination
Do Robot Snakes Dream like Electric Sheep? Investigating the Effects of Architectural Inductive Biases on Hallucination
Jerry Huang
Prasanna Parthasarathi
Mehdi Rezagholizadeh
Boxing Chen
Sarath Chandar
48
0
0
22 Oct 2024
Making Every Frame Matter: Continuous Activity Recognition in Streaming Video via Adaptive Video Context Modeling
Making Every Frame Matter: Continuous Activity Recognition in Streaming Video via Adaptive Video Context Modeling
Hao Wu
Donglin Bai
Shiqi Jiang
Qianxi Zhang
Y. Yang
Ting Cao
Fengyuan Xu
Yunxin Liu
Fengyuan Xu
34
0
0
19 Oct 2024
Rethinking Transformer for Long Contextual Histopathology Whole Slide
  Image Analysis
Rethinking Transformer for Long Contextual Histopathology Whole Slide Image Analysis
Honglin Li
Yunlong Zhang
Pingyi Chen
Zhongyi Shui
Chenglu Zhu
Lin Yang
MedIm
26
4
0
18 Oct 2024
scFusionTTT: Single-cell transcriptomics and proteomics fusion with
  Test-Time Training layers
scFusionTTT: Single-cell transcriptomics and proteomics fusion with Test-Time Training layers
Dian Meng
Bohao Xing
Xinlei Huang
Yanran Liu
Yijun Zhou
Yongjun xiao
Zitong Yu
Xubin Zheng
21
1
0
17 Oct 2024
SeerAttention: Learning Intrinsic Sparse Attention in Your LLMs
SeerAttention: Learning Intrinsic Sparse Attention in Your LLMs
Yizhao Gao
Zhichen Zeng
Dayou Du
Shijie Cao
Hayden Kwok-Hay So
...
Junjie Lai
Mao Yang
Ting Cao
Fan Yang
M. Yang
33
18
0
17 Oct 2024
How much do contextualized representations encode long-range context?
How much do contextualized representations encode long-range context?
Simeng Sun
Cheng-Ping Hsieh
39
0
0
16 Oct 2024
An Automatic and Cost-Efficient Peer-Review Framework for Language
  Generation Evaluation
An Automatic and Cost-Efficient Peer-Review Framework for Language Generation Evaluation
Junjie Chen
Weihang Su
Zhumin Chu
Haitao Li
Qinyao Ai
Yiqun Liu
Min Zhang
Shaoping Ma
17
3
0
16 Oct 2024
VisualRWKV-HD and UHD: Advancing High-Resolution Processing for Visual
  Language Models
VisualRWKV-HD and UHD: Advancing High-Resolution Processing for Visual Language Models
Zihang Li
Haowen Hou
17
1
0
15 Oct 2024
Measuring Spiritual Values and Bias of Large Language Models
Measuring Spiritual Values and Bias of Large Language Models
Songyuan Liu
Ziyang Zhang
Runze Yan
Wei Wu
Carl Yang
Jiaying Lu
19
0
0
15 Oct 2024
Towards Better Multi-head Attention via Channel-wise Sample Permutation
Towards Better Multi-head Attention via Channel-wise Sample Permutation
Shen Yuan
Hongteng Xu
15
0
0
14 Oct 2024
Efficiently Scanning and Resampling Spatio-Temporal Tasks with Irregular
  Observations
Efficiently Scanning and Resampling Spatio-Temporal Tasks with Irregular Observations
Bryce Ferenczi
Michael G. Burke
Tom Drummond
21
0
0
11 Oct 2024
Parameter-Efficient Fine-Tuning of State Space Models
Parameter-Efficient Fine-Tuning of State Space Models
Kevin Galim
Wonjun Kang
Yuchen Zeng
H. Koo
Kangwook Lee
29
4
0
11 Oct 2024
Assessing Episodic Memory in LLMs with Sequence Order Recall Tasks
Assessing Episodic Memory in LLMs with Sequence Order Recall Tasks
Mathis Pink
Vy A. Vo
Qinyuan Wu
Jianing Mu
Javier S. Turek
Uri Hasson
K. A. Norman
Sebastian Michelmann
Alexander G. Huth
Mariya Toneva
21
1
0
10 Oct 2024
Stuffed Mamba: State Collapse and State Capacity of RNN-Based
  Long-Context Modeling
Stuffed Mamba: State Collapse and State Capacity of RNN-Based Long-Context Modeling
Yingfa Chen
Xinrong Zhang
Shengding Hu
Xu Han
Zhiyuan Liu
Maosong Sun
Mamba
44
2
0
09 Oct 2024
MatMamba: A Matryoshka State Space Model
MatMamba: A Matryoshka State Space Model
Abhinav Shukla
Sai H. Vemprala
Aditya Kusupati
Ashish Kapoor
Mamba
21
0
0
09 Oct 2024
Towards Universality: Studying Mechanistic Similarity Across Language
  Model Architectures
Towards Universality: Studying Mechanistic Similarity Across Language Model Architectures
Junxuan Wang
Xuyang Ge
Wentao Shu
Qiong Tang
Yunhua Zhou
Zhengfu He
Xipeng Qiu
27
7
0
09 Oct 2024
Rodimus*: Breaking the Accuracy-Efficiency Trade-Off with Efficient
  Attentions
Rodimus*: Breaking the Accuracy-Efficiency Trade-Off with Efficient Attentions
Zhihao He
Hang Yu
Zi Gong
Shizhan Liu
Jianguo Li
Weiyao Lin
VLM
30
1
0
09 Oct 2024
Joint Fine-tuning and Conversion of Pretrained Speech and Language Models towards Linear Complexity
Joint Fine-tuning and Conversion of Pretrained Speech and Language Models towards Linear Complexity
Mutian He
Philip N. Garner
78
0
0
09 Oct 2024
ParallelSpec: Parallel Drafter for Efficient Speculative Decoding
ParallelSpec: Parallel Drafter for Efficient Speculative Decoding
Zilin Xiao
Hongming Zhang
Tao Ge
Siru Ouyang
Vicente Ordonez
Dong Yu
39
5
0
08 Oct 2024
Falcon Mamba: The First Competitive Attention-free 7B Language Model
Falcon Mamba: The First Competitive Attention-free 7B Language Model
Jingwei Zuo
Maksim Velikanov
Dhia Eddine Rhaiem
Ilyas Chahed
Younes Belkada
Guillaume Kunsch
Hakim Hacid
ALM
52
12
0
07 Oct 2024
SPikE-SSM: A Sparse, Precise, and Efficient Spiking State Space Model
  for Long Sequences Learning
SPikE-SSM: A Sparse, Precise, and Efficient Spiking State Space Model for Long Sequences Learning
Yan Zhong
Ruoyu Zhao
Chao Wang
Qinghai Guo
Jianguo Zhang
Zhichao Lu
Luziwei Leng
26
2
0
07 Oct 2024
On Efficient Variants of Segment Anything Model: A Survey
On Efficient Variants of Segment Anything Model: A Survey
Xiaorui Sun
J. Liu
H. Shen
Xiaofeng Zhu
Ping Hu
VLM
40
4
0
07 Oct 2024
DAPE V2: Process Attention Score as Feature Map for Length Extrapolation
DAPE V2: Process Attention Score as Feature Map for Length Extrapolation
Chuanyang Zheng
Yihang Gao
Han Shi
Jing Xiong
Jiankai Sun
...
Xiaozhe Ren
Michael Ng
Xin Jiang
Zhenguo Li
Yu Li
21
1
0
07 Oct 2024
Forgetting Curve: A Reliable Method for Evaluating Memorization
  Capability for Long-context Models
Forgetting Curve: A Reliable Method for Evaluating Memorization Capability for Long-context Models
Xinyu Liu
Runsong Zhao
Pengcheng Huang
Chunyang Xiao
Bei Li
Jingang Wang
Tong Xiao
Jingbo Zhu
16
0
0
07 Oct 2024
Large Language Model Inference Acceleration: A Comprehensive Hardware Perspective
Large Language Model Inference Acceleration: A Comprehensive Hardware Perspective
Jinhao Li
Jiaming Xu
Shan Huang
Yonghua Chen
Wen Li
...
Jiayi Pan
Li Ding
Hao Zhou
Yu Wang
Guohao Dai
46
13
0
06 Oct 2024
RespDiff: An End-to-End Multi-scale RNN Diffusion Model for Respiratory
  Waveform Estimation from PPG Signals
RespDiff: An End-to-End Multi-scale RNN Diffusion Model for Respiratory Waveform Estimation from PPG Signals
Yuyang Miao
Zehua Chen
C. Li
Danilo P. Mandic
DiffM
MedIm
20
0
0
06 Oct 2024
Accelerating Inference of Networks in the Frequency Domain
Accelerating Inference of Networks in the Frequency Domain
Chenqiu Zhao
Guanfang Dong
Anup Basu
33
10
0
06 Oct 2024
LongGenBench: Long-context Generation Benchmark
LongGenBench: Long-context Generation Benchmark
Xiang Liu
Peijie Dong
Xuming Hu
Xiaowen Chu
RALM
28
8
0
05 Oct 2024
LoRTA: Low Rank Tensor Adaptation of Large Language Models
LoRTA: Low Rank Tensor Adaptation of Large Language Models
Ignacio Hounie
Charilaos I. Kanatsoulis
Arnuv Tandon
Alejandro Ribeiro
29
0
0
05 Oct 2024
Can Mamba Always Enjoy the "Free Lunch"?
Can Mamba Always Enjoy the "Free Lunch"?
Ruifeng Ren
Zhicong Li
Yong Liu
34
1
0
04 Oct 2024
UncertaintyRAG: Span-Level Uncertainty Enhanced Long-Context Modeling
  for Retrieval-Augmented Generation
UncertaintyRAG: Span-Level Uncertainty Enhanced Long-Context Modeling for Retrieval-Augmented Generation
Zixuan Li
Jing Xiong
Fanghua Ye
Chuanyang Zheng
Xun Wu
...
Xiaodan Liang
Chengming Li
Zhenan Sun
Lingpeng Kong
Ngai Wong
RALM
UQLM
22
0
0
03 Oct 2024
How to Train Long-Context Language Models (Effectively)
How to Train Long-Context Language Models (Effectively)
Tianyu Gao
Alexander Wettig
Howard Yen
Danqi Chen
RALM
55
36
0
03 Oct 2024
Were RNNs All We Needed?
Were RNNs All We Needed?
Leo Feng
Frederick Tung
Mohamed Osama Ahmed
Yoshua Bengio
Hossein Hajimirsadegh
AI4TS
18
14
1
02 Oct 2024
OccRWKV: Rethinking Efficient 3D Semantic Occupancy Prediction with Linear Complexity
OccRWKV: Rethinking Efficient 3D Semantic Occupancy Prediction with Linear Complexity
Junming Wang
Wei Yin
Xiaoxiao Long
Xingyu Zhang
Zebin Xing
Xiaoyang Guo
Qian Zhang
3DPC
34
2
0
30 Sep 2024
Small Language Models: Survey, Measurements, and Insights
Small Language Models: Survey, Measurements, and Insights
Zhenyan Lu
Xiang Li
Dongqi Cai
Rongjie Yi
Fangming Liu
Xiwen Zhang
Nicholas D. Lane
Mengwei Xu
ObjD
LRM
47
31
0
24 Sep 2024
Towards LifeSpan Cognitive Systems
Towards LifeSpan Cognitive Systems
Yu Wang
Chi Han
Tongtong Wu
Xiaoxin He
Wangchunshu Zhou
...
Zexue He
Wei Wang
Gholamreza Haffari
Heng Ji
Julian McAuley
KELM
CLL
78
1
0
20 Sep 2024
Shaking Up VLMs: Comparing Transformers and Structured State Space
  Models for Vision & Language Modeling
Shaking Up VLMs: Comparing Transformers and Structured State Space Models for Vision & Language Modeling
Georgios Pantazopoulos
Malvina Nikandrou
Alessandro Suglia
Oliver Lemon
Arash Eshghi
Mamba
33
0
0
09 Sep 2024
Improving Pretraining Data Using Perplexity Correlations
Improving Pretraining Data Using Perplexity Correlations
Tristan Thrush
Christopher Potts
Tatsunori Hashimoto
32
17
0
09 Sep 2024
Experimentation in Content Moderation using RWKV
Experimentation in Content Moderation using RWKV
Umut Yildirim
Rohan Dutta
Burak Yildirim
Atharva Vaidya
33
2
0
05 Sep 2024
LinFusion: 1 GPU, 1 Minute, 16K Image
LinFusion: 1 GPU, 1 Minute, 16K Image
Songhua Liu
Weihao Yu
Zhenxiong Tan
Xinchao Wang
32
11
0
03 Sep 2024
The USTC-NERCSLIP Systems for the CHiME-8 NOTSOFAR-1 Challenge
The USTC-NERCSLIP Systems for the CHiME-8 NOTSOFAR-1 Challenge
Shutong Niu
Ruoyu Wang
Jun Du
Gaobin Yang
Yanhui Tu
...
Tian Gao
Genshun Wan
Feng Ma
Jia Pan
Jianqing Gao
21
4
0
03 Sep 2024
SpikingSSMs: Learning Long Sequences with Sparse and Parallel Spiking
  State Space Models
SpikingSSMs: Learning Long Sequences with Sparse and Parallel Spiking State Space Models
Shuaijie Shen
Chao Wang
Renzhuo Huang
Yan Zhong
Qinghai Guo
Zhichao Lu
Jianguo Zhang
Luziwei Leng
19
7
0
27 Aug 2024
GSIFN: A Graph-Structured and Interlaced-Masked Multimodal
  Transformer-based Fusion Network for Multimodal Sentiment Analysis
GSIFN: A Graph-Structured and Interlaced-Masked Multimodal Transformer-based Fusion Network for Multimodal Sentiment Analysis
Yijie Jin
17
0
0
27 Aug 2024
Shifted Window Fourier Transform And Retention For Image Captioning
Shifted Window Fourier Transform And Retention For Image Captioning
J. Hu
Roberto Cavicchioli
Alessandro Capotondi
VLM
16
0
0
25 Aug 2024
A Law of Next-Token Prediction in Large Language Models
A Law of Next-Token Prediction in Large Language Models
Hangfeng He
Weijie J. Su
19
5
0
24 Aug 2024
Scalable Autoregressive Image Generation with Mamba
Scalable Autoregressive Image Generation with Mamba
Haopeng Li
Jinyue Yang
Kexin Wang
Xuerui Qiu
Yuhong Chou
Xin Li
Guoqi Li
Mamba
37
12
0
22 Aug 2024
Transformers to SSMs: Distilling Quadratic Knowledge to Subquadratic Models
Transformers to SSMs: Distilling Quadratic Knowledge to Subquadratic Models
Aviv Bick
Kevin Y. Li
Eric P. Xing
J. Zico Kolter
Albert Gu
Mamba
43
24
0
19 Aug 2024
The Future of Open Human Feedback
The Future of Open Human Feedback
Shachar Don-Yehiya
Ben Burtenshaw
Ramon Fernandez Astudillo
Cailean Osborne
Mimansa Jaiswal
...
Omri Abend
Jennifer Ding
Sara Hooker
Hannah Rose Kirk
Leshem Choshen
VLM
ALM
44
1
0
15 Aug 2024
Previous
12345678
Next