ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2102.11174
  4. Cited By
Linear Transformers Are Secretly Fast Weight Programmers

Linear Transformers Are Secretly Fast Weight Programmers

22 February 2021
Imanol Schlag
Kazuki Irie
Jürgen Schmidhuber
ArXivPDFHTML

Papers citing "Linear Transformers Are Secretly Fast Weight Programmers"

50 / 162 papers shown
Title
Accelerating Neural Self-Improvement via Bootstrapping
Accelerating Neural Self-Improvement via Bootstrapping
Kazuki Irie
Jürgen Schmidhuber
27
1
0
02 May 2023
Meta-Learned Models of Cognition
Meta-Learned Models of Cognition
Marcel Binz
Ishita Dasgupta
A. Jagadish
M. Botvinick
Jane X. Wang
Eric Schulz
30
25
0
12 Apr 2023
POPGym: Benchmarking Partially Observable Reinforcement Learning
POPGym: Benchmarking Partially Observable Reinforcement Learning
Steven D. Morad
Ryan Kortvelesy
Matteo Bettini
Stephan Liwicki
Amanda Prorok
OffRL
14
37
0
03 Mar 2023
Permutation-Invariant Set Autoencoders with Fixed-Size Embeddings for
  Multi-Agent Learning
Permutation-Invariant Set Autoencoders with Fixed-Size Embeddings for Multi-Agent Learning
Ryan Kortvelesy
Steven D. Morad
Amanda Prorok
AI4CE
24
2
0
24 Feb 2023
Hyena Hierarchy: Towards Larger Convolutional Language Models
Hyena Hierarchy: Towards Larger Convolutional Language Models
Michael Poli
Stefano Massaroli
Eric Q. Nguyen
Daniel Y. Fu
Tri Dao
S. Baccus
Yoshua Bengio
Stefano Ermon
Christopher Ré
VLM
17
284
0
21 Feb 2023
Theory of coupled neuronal-synaptic dynamics
Theory of coupled neuronal-synaptic dynamics
David G. Clark
L. F. Abbott
14
18
0
17 Feb 2023
Self-Organising Neural Discrete Representation Learning à la Kohonen
Self-Organising Neural Discrete Representation Learning à la Kohonen
Kazuki Irie
Róbert Csordás
Jürgen Schmidhuber
SSL
26
1
0
15 Feb 2023
Efficient Attention via Control Variates
Efficient Attention via Control Variates
Lin Zheng
Jianbo Yuan
Chong-Jun Wang
Lingpeng Kong
28
18
0
09 Feb 2023
Hebbian and Gradient-based Plasticity Enables Robust Memory and Rapid
  Learning in RNNs
Hebbian and Gradient-based Plasticity Enables Robust Memory and Rapid Learning in RNNs
Y. Duan
Zhongfan Jia
Qian Li
Yi Zhong
Kaisheng Ma
AAML
22
2
0
07 Feb 2023
Mnemosyne: Learning to Train Transformers with Transformers
Mnemosyne: Learning to Train Transformers with Transformers
Deepali Jain
K. Choromanski
Kumar Avinava Dubey
Sumeet Singh
Vikas Sindhwani
Tingnan Zhang
Jie Tan
OffRL
33
9
0
02 Feb 2023
Simplex Random Features
Simplex Random Features
Isaac Reid
K. Choromanski
Valerii Likhosherstov
Adrian Weller
21
7
0
31 Jan 2023
Learning One Abstract Bit at a Time Through Self-Invented Experiments
  Encoded as Neural Networks
Learning One Abstract Bit at a Time Through Self-Invented Experiments Encoded as Neural Networks
Vincent Herrmann
Louis Kirsch
Jürgen Schmidhuber
AI4CE
38
4
0
29 Dec 2022
On Transforming Reinforcement Learning by Transformer: The Development
  Trajectory
On Transforming Reinforcement Learning by Transformer: The Development Trajectory
Shengchao Hu
Li Shen
Ya-Qin Zhang
Yixin Chen
Dacheng Tao
OffRL
23
24
0
29 Dec 2022
Annotated History of Modern AI and Deep Learning
Annotated History of Modern AI and Deep Learning
Juergen Schmidhuber
MLAU
AI4TS
AI4CE
28
22
0
21 Dec 2022
Transformers learn in-context by gradient descent
Transformers learn in-context by gradient descent
J. Oswald
Eyvind Niklasson
E. Randazzo
João Sacramento
A. Mordvintsev
A. Zhmoginov
Max Vladymyrov
MLT
28
427
0
15 Dec 2022
Meta-Learning Fast Weight Language Models
Meta-Learning Fast Weight Language Models
Kevin Clark
Kelvin Guu
Ming-Wei Chang
Panupong Pasupat
Geoffrey E. Hinton
Mohammad Norouzi
KELM
29
13
0
05 Dec 2022
What learning algorithm is in-context learning? Investigations with
  linear models
What learning algorithm is in-context learning? Investigations with linear models
Ekin Akyürek
Dale Schuurmans
Jacob Andreas
Tengyu Ma
Denny Zhou
29
437
0
28 Nov 2022
Learning to Control Rapidly Changing Synaptic Connections: An
  Alternative Type of Memory in Sequence Processing Artificial Neural Networks
Learning to Control Rapidly Changing Synaptic Connections: An Alternative Type of Memory in Sequence Processing Artificial Neural Networks
Kazuki Irie
Jürgen Schmidhuber
KELM
16
1
0
17 Nov 2022
Characterizing Verbatim Short-Term Memory in Neural Language Models
Characterizing Verbatim Short-Term Memory in Neural Language Models
K. Armeni
C. Honey
Tal Linzen
KELM
RALM
25
3
0
24 Oct 2022
Modeling Context With Linear Attention for Scalable Document-Level
  Translation
Modeling Context With Linear Attention for Scalable Document-Level Translation
Zhaofeng Wu
Hao Peng
Nikolaos Pappas
Noah A. Smith
14
3
0
16 Oct 2022
CAB: Comprehensive Attention Benchmarking on Long Sequence Modeling
CAB: Comprehensive Attention Benchmarking on Long Sequence Modeling
Jinchao Zhang
Shuyang Jiang
Jiangtao Feng
Lin Zheng
Lingpeng Kong
3DV
39
9
0
14 Oct 2022
Designing Robust Transformers using Robust Kernel Density Estimation
Designing Robust Transformers using Robust Kernel Density Estimation
Xing Han
Tongzheng Ren
T. Nguyen
Khai Nguyen
Joydeep Ghosh
Nhat Ho
21
6
0
11 Oct 2022
LARF: Two-level Attention-based Random Forests with a Mixture of
  Contamination Models
LARF: Two-level Attention-based Random Forests with a Mixture of Contamination Models
A. Konstantinov
Lev V. Utkin
30
0
0
11 Oct 2022
Fine-Tuning Pre-trained Transformers into Decaying Fast Weights
Fine-Tuning Pre-trained Transformers into Decaying Fast Weights
H. H. Mao
63
20
0
09 Oct 2022
Images as Weight Matrices: Sequential Image Generation Through Synaptic
  Learning Rules
Images as Weight Matrices: Sequential Image Generation Through Synaptic Learning Rules
Kazuki Irie
Jürgen Schmidhuber
32
5
0
07 Oct 2022
Deep is a Luxury We Don't Have
Deep is a Luxury We Don't Have
Ahmed Taha
Yen Nhi Truong Vu
Brent Mombourquette
Thomas P. Matthews
Jason Su
Sadanand Singh
ViT
MedIm
20
2
0
11 Aug 2022
Learning to Generalize with Object-centric Agents in the Open World
  Survival Game Crafter
Learning to Generalize with Object-centric Agents in the Open World Survival Game Crafter
Aleksandar Stanić
Yujin Tang
David R Ha
Jürgen Schmidhuber
ELM
29
12
0
05 Aug 2022
AGBoost: Attention-based Modification of Gradient Boosting Machine
AGBoost: Attention-based Modification of Gradient Boosting Machine
A. Konstantinov
Lev V. Utkin
Stanislav R. Kirpichenko
ODL
11
7
0
12 Jul 2022
Attention and Self-Attention in Random Forests
Attention and Self-Attention in Random Forests
Lev V. Utkin
A. Konstantinov
34
3
0
09 Jul 2022
Goal-Conditioned Generators of Deep Policies
Goal-Conditioned Generators of Deep Policies
Francesco Faccio
Vincent Herrmann
Aditya A. Ramesh
Louis Kirsch
Jürgen Schmidhuber
OffRL
25
8
0
04 Jul 2022
Rethinking Query-Key Pairwise Interactions in Vision Transformers
Rethinking Query-Key Pairwise Interactions in Vision Transformers
Cheng-rong Li
Yangxin Liu
31
0
0
01 Jul 2022
Short-Term Plasticity Neurons Learning to Learn and Forget
Short-Term Plasticity Neurons Learning to Learn and Forget
Hector Garcia Rodriguez
Qinghai Guo
Timoleon Moraitis
13
12
0
28 Jun 2022
Neural Differential Equations for Learning to Program Neural Nets
  Through Continuous Learning Rules
Neural Differential Equations for Learning to Program Neural Nets Through Continuous Learning Rules
Kazuki Irie
Francesco Faccio
Jürgen Schmidhuber
AI4TS
30
11
0
03 Jun 2022
Transformer with Fourier Integral Attentions
Transformer with Fourier Integral Attentions
T. Nguyen
Minh Pham
Tam Nguyen
Khai Nguyen
Stanley J. Osher
Nhat Ho
17
4
0
01 Jun 2022
BayesPCN: A Continually Learnable Predictive Coding Associative Memory
BayesPCN: A Continually Learnable Predictive Coding Associative Memory
Jason Yoo
F. Wood
KELM
89
9
0
20 May 2022
Minimal Neural Network Models for Permutation Invariant Agents
Minimal Neural Network Models for Permutation Invariant Agents
J. Pedersen
S. Risi
43
3
0
12 May 2022
A Call for Clarity in Beam Search: How It Works and When It Stops
A Call for Clarity in Beam Search: How It Works and When It Stops
Jungo Kasai
Keisuke Sakaguchi
Ronan Le Bras
Dragomir R. Radev
Yejin Choi
Noah A. Smith
26
6
0
11 Apr 2022
Linear Complexity Randomized Self-attention Mechanism
Linear Complexity Randomized Self-attention Mechanism
Lin Zheng
Chong-Jun Wang
Lingpeng Kong
20
31
0
10 Apr 2022
On the link between conscious function and general intelligence in
  humans and machines
On the link between conscious function and general intelligence in humans and machines
Arthur Juliani
Kai Arulkumaran
Shuntaro Sasai
Ryota Kanai
34
24
0
24 Mar 2022
Linearizing Transformer with Key-Value Memory
Linearizing Transformer with Key-Value Memory
Yizhe Zhang
Deng Cai
20
5
0
23 Mar 2022
FAR: Fourier Aerial Video Recognition
FAR: Fourier Aerial Video Recognition
D. Kothandaraman
Tianrui Guan
Xijun Wang
Sean Hu
Ming-Shun Lin
Dinesh Manocha
21
13
0
21 Mar 2022
Block-Recurrent Transformers
Block-Recurrent Transformers
DeLesley S. Hutchins
Imanol Schlag
Yuhuai Wu
Ethan Dyer
Behnam Neyshabur
16
94
0
11 Mar 2022
The Dual Form of Neural Networks Revisited: Connecting Test Time
  Predictions to Training Patterns via Spotlights of Attention
The Dual Form of Neural Networks Revisited: Connecting Test Time Predictions to Training Patterns via Spotlights of Attention
Kazuki Irie
Róbert Csordás
Jürgen Schmidhuber
14
42
0
11 Feb 2022
A Modern Self-Referential Weight Matrix That Learns to Modify Itself
A Modern Self-Referential Weight Matrix That Learns to Modify Itself
Kazuki Irie
Imanol Schlag
Róbert Csordás
Jürgen Schmidhuber
14
26
0
11 Feb 2022
Latency Adjustable Transformer Encoder for Language Understanding
Latency Adjustable Transformer Encoder for Language Understanding
Sajjad Kachuee
M. Sharifkhani
29
0
0
10 Jan 2022
Attention-based Random Forest and Contamination Model
Attention-based Random Forest and Contamination Model
Lev V. Utkin
A. Konstantinov
21
29
0
08 Jan 2022
Simple Local Attentions Remain Competitive for Long-Context Tasks
Simple Local Attentions Remain Competitive for Long-Context Tasks
Wenhan Xiong
Barlas Ouguz
Anchit Gupta
Xilun Chen
Diana Liskovich
Omer Levy
Wen-tau Yih
Yashar Mehdad
36
29
0
14 Dec 2021
Attention Approximates Sparse Distributed Memory
Attention Approximates Sparse Distributed Memory
Trenton Bricken
C. Pehlevan
20
34
0
10 Nov 2021
Improving Transformers with Probabilistic Attention Keys
Improving Transformers with Probabilistic Attention Keys
Tam Nguyen
T. Nguyen
Dung D. Le
Duy Khuong Nguyen
Viet-Anh Tran
Richard G. Baraniuk
Nhat Ho
Stanley J. Osher
45
32
0
16 Oct 2021
On Learning the Transformer Kernel
On Learning the Transformer Kernel
Sankalan Pal Chowdhury
Adamos Solomou
Kumar Avinava Dubey
Mrinmaya Sachan
ViT
44
14
0
15 Oct 2021
Previous
1234
Next