ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2305.13048
  4. Cited By
RWKV: Reinventing RNNs for the Transformer Era

RWKV: Reinventing RNNs for the Transformer Era

22 May 2023
Bo Peng
Eric Alcaide
Quentin G. Anthony
Alon Albalak
Samuel Arcadinho
Stella Biderman
Huanqi Cao
Xin Cheng
Michael Chung
Matteo Grella
G. Kranthikiran
Xuming He
Haowen Hou
Jiaju Lin
Przemyslaw Kazienko
Jan Kocoñ
Jiaming Kong
Bartlomiej Koptyra
Hayden Lau
Krishna Sri Ipsit Mantri
Ferdinand Mom
Atsushi Saito
Guangyu Song
Xiangru Tang
Bolun Wang
J. S. Wind
Stansilaw Wozniak
Ruichong Zhang
Zhenyuan Zhang
Qihang Zhao
P. Zhou
Qinghua Zhou
Jian Zhu
Rui-Jie Zhu
ArXivPDFHTML

Papers citing "RWKV: Reinventing RNNs for the Transformer Era"

38 / 388 papers shown
Title
EarthPT: a time series foundation model for Earth Observation
EarthPT: a time series foundation model for Earth Observation
Michael J. Smith
Luke Fleming
James E. Geach
AI4TS
14
7
0
13 Sep 2023
Auto-Regressive Next-Token Predictors are Universal Learners
Auto-Regressive Next-Token Predictors are Universal Learners
Eran Malach
LRM
6
35
0
13 Sep 2023
Knowledge Solver: Teaching LLMs to Search for Domain Knowledge from
  Knowledge Graphs
Knowledge Solver: Teaching LLMs to Search for Domain Knowledge from Knowledge Graphs
Chao Feng
Xinyu Zhang
Zichu Fei
KELM
15
28
0
06 Sep 2023
Gated recurrent neural networks discover attention
Gated recurrent neural networks discover attention
Nicolas Zucchet
Seijin Kobayashi
Yassir Akram
J. Oswald
Maxime Larcher
Angelika Steger
João Sacramento
18
8
0
04 Sep 2023
Examining User-Friendly and Open-Sourced Large GPT Models: A Survey on
  Language, Multimodal, and Scientific GPT Models
Examining User-Friendly and Open-Sourced Large GPT Models: A Survey on Language, Multimodal, and Scientific GPT Models
Kaiyuan Gao
Su He
Zhenyu He
Jiacheng Lin
Qizhi Pei
Jie Shao
Wei Zhang
LM&MA
SyDa
15
4
0
27 Aug 2023
DARWIN Series: Domain Specific Large Language Models for Natural Science
DARWIN Series: Domain Specific Large Language Models for Natural Science
Tong Xie
Yuwei Wan
Wei Huang
Zhenyu Yin
Yixuan Liu
...
Chunyu Kit
Clara Grazian
Wenjie Zhang
Imran Razzak
B. Hoex
ELM
ALM
AI4CE
17
22
0
25 Aug 2023
Sparks of Large Audio Models: A Survey and Outlook
Sparks of Large Audio Models: A Survey and Outlook
S. Latif
Moazzam Shoukat
Fahad Shamshad
Muhammad Usama
Yi Ren
...
Wenwu Wang
Xulong Zhang
Roberto Togneri
Erik Cambria
Björn W. Schuller
LM&MA
AuLLM
21
36
0
24 Aug 2023
Stabilizing RNN Gradients through Pre-training
Stabilizing RNN Gradients through Pre-training
Luca Herranz-Celotti
Jean Rouat
14
0
0
23 Aug 2023
LKPNR: LLM and KG for Personalized News Recommendation Framework
LKPNR: LLM and KG for Personalized News Recommendation Framework
Hao Chen
Runfeng Xie
Xia Cui
Zhou Yan
Wang Xin
Zhanwei Xuan
Kai Zhang
AI4TS
8
16
0
23 Aug 2023
Head-to-Tail: How Knowledgeable are Large Language Models (LLMs)? A.K.A.
  Will LLMs Replace Knowledge Graphs?
Head-to-Tail: How Knowledgeable are Large Language Models (LLMs)? A.K.A. Will LLMs Replace Knowledge Graphs?
Kai Sun
Y. Xu
Hanwen Zha
Yue Liu
Xinhsuai Dong
AI4MH
14
130
0
20 Aug 2023
OctoPack: Instruction Tuning Code Large Language Models
OctoPack: Instruction Tuning Code Large Language Models
Niklas Muennighoff
Qian Liu
A. Zebaze
Qinkai Zheng
Binyuan Hui
Terry Yue Zhuo
Swayam Singh
Xiangru Tang
Leandro von Werra
Shayne Longpre
VLM
ALM
47
116
0
14 Aug 2023
Universal Approximation of Linear Time-Invariant (LTI) Systems through
  RNNs: Power of Randomness in Reservoir Computing
Universal Approximation of Linear Time-Invariant (LTI) Systems through RNNs: Power of Randomness in Reservoir Computing
Shashank Jere
Lizhong Zheng
Karim A. Said
Lingjia Liu
6
2
0
04 Aug 2023
TransNormerLLM: A Faster and Better Large Language Model with Improved
  TransNormer
TransNormerLLM: A Faster and Better Large Language Model with Improved TransNormer
Zhen Qin
Dong Li
Weigao Sun
Weixuan Sun
Xuyang Shen
...
Yunshen Wei
Baohong Lv
Xiao Luo
Yu Qiao
Yiran Zhong
28
15
0
27 Jul 2023
Fading memory as inductive bias in residual recurrent networks
Fading memory as inductive bias in residual recurrent networks
I. Dubinin
Felix Effenberger
23
4
0
27 Jul 2023
Evaluating Large Language Models for Radiology Natural Language
  Processing
Evaluating Large Language Models for Radiology Natural Language Processing
Zheng Liu
Tianyang Zhong
Yiwei Li
Yutong Zhang
Yirong Pan
...
Shijie Zhao
Quanzheng Li
Hongtu Zhu
Dinggang Shen
Tianming Liu
LM&MA
ELM
38
6
0
25 Jul 2023
Emotional Intelligence of Large Language Models
Emotional Intelligence of Large Language Models
Xuena Wang
Xueting Li
Zi Yin
Yue Wu
Tsinghua University
14
71
0
18 Jul 2023
Retentive Network: A Successor to Transformer for Large Language Models
Retentive Network: A Successor to Transformer for Large Language Models
Yutao Sun
Li Dong
Shaohan Huang
Shuming Ma
Yuqing Xia
Jilong Xue
Jianyong Wang
Furu Wei
LRM
29
300
0
17 Jul 2023
CMATH: Can Your Language Model Pass Chinese Elementary School Math Test?
CMATH: Can Your Language Model Pass Chinese Elementary School Math Test?
Tianwen Wei
Jian Luan
W. Liu
Shuang Dong
B. Wang
ELM
20
30
0
29 Jun 2023
Chinese Fine-Grained Financial Sentiment Analysis with Large Language
  Models
Chinese Fine-Grained Financial Sentiment Analysis with Large Language Models
Yinyu Lan
Yanru Wu
Wang Xu
Weiqiang Feng
Youhao Zhang
18
3
0
25 Jun 2023
Exposing Attention Glitches with Flip-Flop Language Modeling
Exposing Attention Glitches with Flip-Flop Language Modeling
Bingbin Liu
Jordan T. Ash
Surbhi Goel
A. Krishnamurthy
Cyril Zhang
LRM
19
46
0
01 Jun 2023
Memory Efficient Neural Processes via Constant Memory Attention Block
Memory Efficient Neural Processes via Constant Memory Attention Block
Leo Feng
Frederick Tung
Hossein Hajimirsadeghi
Yoshua Bengio
Mohamed Osama Ahmed
8
5
0
23 May 2023
Scaling Transformer to 1M tokens and beyond with RMT
Scaling Transformer to 1M tokens and beyond with RMT
Aydar Bulatov
Yuri Kuratov
Yermek Kapushev
Mikhail Burtsev
LRM
11
86
0
19 Apr 2023
Resurrecting Recurrent Neural Networks for Long Sequences
Resurrecting Recurrent Neural Networks for Long Sequences
Antonio Orvieto
Samuel L. Smith
Albert Gu
Anushan Fernando
Çağlar Gülçehre
Razvan Pascanu
Soham De
83
258
0
11 Mar 2023
SpikeGPT: Generative Pre-trained Language Model with Spiking Neural
  Networks
SpikeGPT: Generative Pre-trained Language Model with Spiking Neural Networks
Rui-Jie Zhu
Qihang Zhao
Guoqi Li
Jason Eshraghian
BDL
VLM
14
80
0
27 Feb 2023
TunesFormer: Forming Irish Tunes with Control Codes by Bar Patching
TunesFormer: Forming Irish Tunes with Control Codes by Bar Patching
Shangda Wu
Xiaobing Li
Feng Yu
Maosong Sun
19
11
0
07 Jan 2023
Cramming: Training a Language Model on a Single GPU in One Day
Cramming: Training a Language Model on a Single GPU in One Day
Jonas Geiping
Tom Goldstein
MoE
20
83
0
28 Dec 2022
Deanthropomorphising NLP: Can a Language Model Be Conscious?
Deanthropomorphising NLP: Can a Language Model Be Conscious?
Matthew Shardlow
Piotr Przybyła
17
4
0
21 Nov 2022
What Language Model to Train if You Have One Million GPU Hours?
What Language Model to Train if You Have One Million GPU Hours?
Teven Le Scao
Thomas Wang
Daniel Hesslow
Lucile Saulnier
Stas Bekman
...
Lintang Sutawika
Jaesung Tae
Zheng-Xin Yong
Julien Launay
Iz Beltagy
MoE
AI4CE
212
103
0
27 Oct 2022
Broken Neural Scaling Laws
Broken Neural Scaling Laws
Ethan Caballero
Kshitij Gupta
Irina Rish
David M. Krueger
11
74
0
26 Oct 2022
Training language models to follow instructions with human feedback
Training language models to follow instructions with human feedback
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLM
ALM
301
11,730
0
04 Mar 2022
Primer: Searching for Efficient Transformers for Language Modeling
Primer: Searching for Efficient Transformers for Language Modeling
David R. So
Wojciech Mañke
Hanxiao Liu
Zihang Dai
Noam M. Shazeer
Quoc V. Le
VLM
83
149
0
17 Sep 2021
Train Short, Test Long: Attention with Linear Biases Enables Input
  Length Extrapolation
Train Short, Test Long: Attention with Linear Biases Enables Input Length Extrapolation
Ofir Press
Noah A. Smith
M. Lewis
234
690
0
27 Aug 2021
MLP-Mixer: An all-MLP Architecture for Vision
MLP-Mixer: An all-MLP Architecture for Vision
Ilya O. Tolstikhin
N. Houlsby
Alexander Kolesnikov
Lucas Beyer
Xiaohua Zhai
...
Andreas Steiner
Daniel Keysers
Jakob Uszkoreit
Mario Lucic
Alexey Dosovitskiy
239
2,554
0
04 May 2021
Creativity and Machine Learning: A Survey
Creativity and Machine Learning: A Survey
Giorgio Franceschelli
Mirco Musolesi
VLM
AI4CE
16
32
0
06 Apr 2021
The Pile: An 800GB Dataset of Diverse Text for Language Modeling
The Pile: An 800GB Dataset of Diverse Text for Language Modeling
Leo Gao
Stella Biderman
Sid Black
Laurence Golding
Travis Hoppe
...
Horace He
Anish Thite
Noa Nabeshima
Shawn Presser
Connor Leahy
AIMat
236
1,508
0
31 Dec 2020
Big Bird: Transformers for Longer Sequences
Big Bird: Transformers for Longer Sequences
Manzil Zaheer
Guru Guruganesh
Kumar Avinava Dubey
Joshua Ainslie
Chris Alberti
...
Philip Pham
Anirudh Ravula
Qifan Wang
Li Yang
Amr Ahmed
VLM
246
1,982
0
28 Jul 2020
Scaling Laws for Neural Language Models
Scaling Laws for Neural Language Models
Jared Kaplan
Sam McCandlish
T. Henighan
Tom B. Brown
B. Chess
R. Child
Scott Gray
Alec Radford
Jeff Wu
Dario Amodei
220
3,054
0
23 Jan 2020
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language
  Understanding
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding
Alex Jinpeng Wang
Amanpreet Singh
Julian Michael
Felix Hill
Omer Levy
Samuel R. Bowman
ELM
294
6,927
0
20 Apr 2018
Previous
12345678