Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2102.11174
Cited By
Linear Transformers Are Secretly Fast Weight Programmers
22 February 2021
Imanol Schlag
Kazuki Irie
Jürgen Schmidhuber
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Linear Transformers Are Secretly Fast Weight Programmers"
50 / 162 papers shown
Title
Accelerating Neural Self-Improvement via Bootstrapping
Kazuki Irie
Jürgen Schmidhuber
27
1
0
02 May 2023
Meta-Learned Models of Cognition
Marcel Binz
Ishita Dasgupta
A. Jagadish
M. Botvinick
Jane X. Wang
Eric Schulz
30
25
0
12 Apr 2023
POPGym: Benchmarking Partially Observable Reinforcement Learning
Steven D. Morad
Ryan Kortvelesy
Matteo Bettini
Stephan Liwicki
Amanda Prorok
OffRL
14
37
0
03 Mar 2023
Permutation-Invariant Set Autoencoders with Fixed-Size Embeddings for Multi-Agent Learning
Ryan Kortvelesy
Steven D. Morad
Amanda Prorok
AI4CE
24
2
0
24 Feb 2023
Hyena Hierarchy: Towards Larger Convolutional Language Models
Michael Poli
Stefano Massaroli
Eric Q. Nguyen
Daniel Y. Fu
Tri Dao
S. Baccus
Yoshua Bengio
Stefano Ermon
Christopher Ré
VLM
17
284
0
21 Feb 2023
Theory of coupled neuronal-synaptic dynamics
David G. Clark
L. F. Abbott
14
18
0
17 Feb 2023
Self-Organising Neural Discrete Representation Learning à la Kohonen
Kazuki Irie
Róbert Csordás
Jürgen Schmidhuber
SSL
26
1
0
15 Feb 2023
Efficient Attention via Control Variates
Lin Zheng
Jianbo Yuan
Chong-Jun Wang
Lingpeng Kong
28
18
0
09 Feb 2023
Hebbian and Gradient-based Plasticity Enables Robust Memory and Rapid Learning in RNNs
Y. Duan
Zhongfan Jia
Qian Li
Yi Zhong
Kaisheng Ma
AAML
22
2
0
07 Feb 2023
Mnemosyne: Learning to Train Transformers with Transformers
Deepali Jain
K. Choromanski
Kumar Avinava Dubey
Sumeet Singh
Vikas Sindhwani
Tingnan Zhang
Jie Tan
OffRL
33
9
0
02 Feb 2023
Simplex Random Features
Isaac Reid
K. Choromanski
Valerii Likhosherstov
Adrian Weller
21
7
0
31 Jan 2023
Learning One Abstract Bit at a Time Through Self-Invented Experiments Encoded as Neural Networks
Vincent Herrmann
Louis Kirsch
Jürgen Schmidhuber
AI4CE
38
4
0
29 Dec 2022
On Transforming Reinforcement Learning by Transformer: The Development Trajectory
Shengchao Hu
Li Shen
Ya-Qin Zhang
Yixin Chen
Dacheng Tao
OffRL
23
24
0
29 Dec 2022
Annotated History of Modern AI and Deep Learning
Juergen Schmidhuber
MLAU
AI4TS
AI4CE
28
22
0
21 Dec 2022
Transformers learn in-context by gradient descent
J. Oswald
Eyvind Niklasson
E. Randazzo
João Sacramento
A. Mordvintsev
A. Zhmoginov
Max Vladymyrov
MLT
28
427
0
15 Dec 2022
Meta-Learning Fast Weight Language Models
Kevin Clark
Kelvin Guu
Ming-Wei Chang
Panupong Pasupat
Geoffrey E. Hinton
Mohammad Norouzi
KELM
29
13
0
05 Dec 2022
What learning algorithm is in-context learning? Investigations with linear models
Ekin Akyürek
Dale Schuurmans
Jacob Andreas
Tengyu Ma
Denny Zhou
29
437
0
28 Nov 2022
Learning to Control Rapidly Changing Synaptic Connections: An Alternative Type of Memory in Sequence Processing Artificial Neural Networks
Kazuki Irie
Jürgen Schmidhuber
KELM
16
1
0
17 Nov 2022
Characterizing Verbatim Short-Term Memory in Neural Language Models
K. Armeni
C. Honey
Tal Linzen
KELM
RALM
25
3
0
24 Oct 2022
Modeling Context With Linear Attention for Scalable Document-Level Translation
Zhaofeng Wu
Hao Peng
Nikolaos Pappas
Noah A. Smith
14
3
0
16 Oct 2022
CAB: Comprehensive Attention Benchmarking on Long Sequence Modeling
Jinchao Zhang
Shuyang Jiang
Jiangtao Feng
Lin Zheng
Lingpeng Kong
3DV
39
9
0
14 Oct 2022
Designing Robust Transformers using Robust Kernel Density Estimation
Xing Han
Tongzheng Ren
T. Nguyen
Khai Nguyen
Joydeep Ghosh
Nhat Ho
21
6
0
11 Oct 2022
LARF: Two-level Attention-based Random Forests with a Mixture of Contamination Models
A. Konstantinov
Lev V. Utkin
30
0
0
11 Oct 2022
Fine-Tuning Pre-trained Transformers into Decaying Fast Weights
H. H. Mao
63
20
0
09 Oct 2022
Images as Weight Matrices: Sequential Image Generation Through Synaptic Learning Rules
Kazuki Irie
Jürgen Schmidhuber
32
5
0
07 Oct 2022
Deep is a Luxury We Don't Have
Ahmed Taha
Yen Nhi Truong Vu
Brent Mombourquette
Thomas P. Matthews
Jason Su
Sadanand Singh
ViT
MedIm
20
2
0
11 Aug 2022
Learning to Generalize with Object-centric Agents in the Open World Survival Game Crafter
Aleksandar Stanić
Yujin Tang
David R Ha
Jürgen Schmidhuber
ELM
29
12
0
05 Aug 2022
AGBoost: Attention-based Modification of Gradient Boosting Machine
A. Konstantinov
Lev V. Utkin
Stanislav R. Kirpichenko
ODL
11
7
0
12 Jul 2022
Attention and Self-Attention in Random Forests
Lev V. Utkin
A. Konstantinov
34
3
0
09 Jul 2022
Goal-Conditioned Generators of Deep Policies
Francesco Faccio
Vincent Herrmann
Aditya A. Ramesh
Louis Kirsch
Jürgen Schmidhuber
OffRL
25
8
0
04 Jul 2022
Rethinking Query-Key Pairwise Interactions in Vision Transformers
Cheng-rong Li
Yangxin Liu
31
0
0
01 Jul 2022
Short-Term Plasticity Neurons Learning to Learn and Forget
Hector Garcia Rodriguez
Qinghai Guo
Timoleon Moraitis
13
12
0
28 Jun 2022
Neural Differential Equations for Learning to Program Neural Nets Through Continuous Learning Rules
Kazuki Irie
Francesco Faccio
Jürgen Schmidhuber
AI4TS
30
11
0
03 Jun 2022
Transformer with Fourier Integral Attentions
T. Nguyen
Minh Pham
Tam Nguyen
Khai Nguyen
Stanley J. Osher
Nhat Ho
17
4
0
01 Jun 2022
BayesPCN: A Continually Learnable Predictive Coding Associative Memory
Jason Yoo
F. Wood
KELM
89
9
0
20 May 2022
Minimal Neural Network Models for Permutation Invariant Agents
J. Pedersen
S. Risi
43
3
0
12 May 2022
A Call for Clarity in Beam Search: How It Works and When It Stops
Jungo Kasai
Keisuke Sakaguchi
Ronan Le Bras
Dragomir R. Radev
Yejin Choi
Noah A. Smith
26
6
0
11 Apr 2022
Linear Complexity Randomized Self-attention Mechanism
Lin Zheng
Chong-Jun Wang
Lingpeng Kong
20
31
0
10 Apr 2022
On the link between conscious function and general intelligence in humans and machines
Arthur Juliani
Kai Arulkumaran
Shuntaro Sasai
Ryota Kanai
34
24
0
24 Mar 2022
Linearizing Transformer with Key-Value Memory
Yizhe Zhang
Deng Cai
20
5
0
23 Mar 2022
FAR: Fourier Aerial Video Recognition
D. Kothandaraman
Tianrui Guan
Xijun Wang
Sean Hu
Ming-Shun Lin
Dinesh Manocha
21
13
0
21 Mar 2022
Block-Recurrent Transformers
DeLesley S. Hutchins
Imanol Schlag
Yuhuai Wu
Ethan Dyer
Behnam Neyshabur
16
94
0
11 Mar 2022
The Dual Form of Neural Networks Revisited: Connecting Test Time Predictions to Training Patterns via Spotlights of Attention
Kazuki Irie
Róbert Csordás
Jürgen Schmidhuber
14
42
0
11 Feb 2022
A Modern Self-Referential Weight Matrix That Learns to Modify Itself
Kazuki Irie
Imanol Schlag
Róbert Csordás
Jürgen Schmidhuber
14
26
0
11 Feb 2022
Latency Adjustable Transformer Encoder for Language Understanding
Sajjad Kachuee
M. Sharifkhani
29
0
0
10 Jan 2022
Attention-based Random Forest and Contamination Model
Lev V. Utkin
A. Konstantinov
21
29
0
08 Jan 2022
Simple Local Attentions Remain Competitive for Long-Context Tasks
Wenhan Xiong
Barlas Ouguz
Anchit Gupta
Xilun Chen
Diana Liskovich
Omer Levy
Wen-tau Yih
Yashar Mehdad
36
29
0
14 Dec 2021
Attention Approximates Sparse Distributed Memory
Trenton Bricken
C. Pehlevan
20
34
0
10 Nov 2021
Improving Transformers with Probabilistic Attention Keys
Tam Nguyen
T. Nguyen
Dung D. Le
Duy Khuong Nguyen
Viet-Anh Tran
Richard G. Baraniuk
Nhat Ho
Stanley J. Osher
45
32
0
16 Oct 2021
On Learning the Transformer Kernel
Sankalan Pal Chowdhury
Adamos Solomou
Kumar Avinava Dubey
Mrinmaya Sachan
ViT
44
14
0
15 Oct 2021
Previous
1
2
3
4
Next