Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2102.11174
Cited By
Linear Transformers Are Secretly Fast Weight Programmers
22 February 2021
Imanol Schlag
Kazuki Irie
Jürgen Schmidhuber
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Linear Transformers Are Secretly Fast Weight Programmers"
50 / 162 papers shown
Title
Unlocking the Secrets of Linear Complexity Sequence Model from A Unified Perspective
Zhen Qin
Xuyang Shen
Weigao Sun
Dong Li
Stanley T. Birchfield
Richard I. Hartley
Yiran Zhong
50
6
0
27 May 2024
Rethinking Transformers in Solving POMDPs
Chenhao Lu
Ruizhe Shi
Yuyao Liu
Kaizhe Hu
Simon S. Du
Huazhe Xu
AI4CE
27
2
0
27 May 2024
On Understanding Attention-Based In-Context Learning for Categorical Data
Aaron T. Wang
William Convertino
Xiang Cheng
Ricardo Henao
Lawrence Carin
51
0
0
27 May 2024
Understanding the differences in Foundation Models: Attention, State Space Models, and Recurrent Neural Networks
Jerome Sieber
Carmen Amo Alonso
A. Didier
M. Zeilinger
Antonio Orvieto
AAML
44
8
0
24 May 2024
HGRN2: Gated Linear RNNs with State Expansion
Zhen Qin
Songlin Yang
Weixuan Sun
Xuyang Shen
Dong Li
Weigao Sun
Yiran Zhong
LRM
44
47
0
11 Apr 2024
Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention
Tsendsuren Munkhdalai
Manaal Faruqui
Siddharth Gopal
LRM
LLMAG
CLL
85
103
0
10 Apr 2024
Structurally Flexible Neural Networks: Evolving the Building Blocks for General Agents
J. Pedersen
Erwan Plantec
Eleni Nisioti
Milton L. Montero
Sebastian Risi
36
1
0
06 Apr 2024
Faster Diffusion via Temporal Attention Decomposition
Haozhe Liu
Wentian Zhang
Jinheng Xie
Francesco Faccio
Mengmeng Xu
Tao Xiang
Mike Zheng Shou
Juan-Manuel Perez-Rua
Jürgen Schmidhuber
DiffM
67
19
0
03 Apr 2024
Mechanistic Design and Scaling of Hybrid Architectures
Michael Poli
Armin W. Thomas
Eric N. D. Nguyen
Pragaash Ponnusamy
Bjorn Deiseroth
...
Brian Hie
Stefano Ermon
Christopher Ré
Ce Zhang
Stefano Massaroli
MoE
49
21
0
26 Mar 2024
Learning Useful Representations of Recurrent Neural Network Weight Matrices
Vincent Herrmann
Francesco Faccio
Jürgen Schmidhuber
21
7
0
18 Mar 2024
Transfer Learning Beyond Bounded Density Ratios
Alkis Kalavasis
Ilias Zadik
Manolis Zampetakis
33
4
0
18 Mar 2024
Learning Associative Memories with Gradient Descent
Vivien A. Cabannes
Berfin Simsek
A. Bietti
38
6
0
28 Feb 2024
PIDformer: Transformer Meets Control Theory
Tam Nguyen
César A. Uribe
Tan-Minh Nguyen
Richard G. Baraniuk
48
7
0
25 Feb 2024
Linear Transformers are Versatile In-Context Learners
Max Vladymyrov
J. Oswald
Mark Sandler
Rong Ge
28
13
0
21 Feb 2024
Linear Transformers with Learnable Kernel Functions are Better In-Context Models
Yaroslav Aksenov
Nikita Balagansky
Sofia Maria Lo Cicero Vaina
Boris Shaposhnikov
Alexey Gorbatovski
Daniil Gavrilov
KELM
33
5
0
16 Feb 2024
On the Resurgence of Recurrent Models for Long Sequences -- Survey and Research Opportunities in the Transformer Era
Matteo Tiezzi
Michele Casoni
Alessandro Betti
Tommaso Guidi
Marco Gori
S. Melacci
19
9
0
12 Feb 2024
The Hedgehog & the Porcupine: Expressive Linear Attentions with Softmax Mimicry
Michael Zhang
Kush S. Bhatia
Hermann Kumbong
Christopher Ré
27
47
0
06 Feb 2024
HyperZ
⋅
\cdot
⋅
Z
⋅
\cdot
⋅
W Operator Connects Slow-Fast Networks for Full Context Interaction
Harvie Zhang
31
0
0
31 Jan 2024
Superiority of Multi-Head Attention in In-Context Linear Regression
Yingqian Cui
Jie Ren
Pengfei He
Jiliang Tang
Yue Xing
37
12
0
30 Jan 2024
Efficiency-oriented approaches for self-supervised speech representation learning
Luis Lugo
Valentin Vielzeuf
SSL
26
1
0
18 Dec 2023
Delving Deeper Into Astromorphic Transformers
Md. Zesun Ahmed Mia
Malyaban Bal
Abhronil Sengupta
34
1
0
18 Dec 2023
SwitchHead: Accelerating Transformers with Mixture-of-Experts Attention
Róbert Csordás
Piotr Piekos
Kazuki Irie
Jürgen Schmidhuber
MoE
22
14
0
13 Dec 2023
Gated Linear Attention Transformers with Hardware-Efficient Training
Songlin Yang
Bailin Wang
Yikang Shen
Rameswar Panda
Yoon Kim
42
140
0
11 Dec 2023
Transformers Implement Functional Gradient Descent to Learn Non-Linear Functions In Context
Xiang Cheng
Yuxin Chen
S. Sra
18
35
0
11 Dec 2023
MIMONets: Multiple-Input-Multiple-Output Neural Networks Exploiting Computation in Superposition
Nicolas Menet
Michael Hersche
G. Karunaratne
Luca Benini
Abu Sebastian
Abbas Rahimi
28
13
0
05 Dec 2023
SARA-RT: Scaling up Robotics Transformers with Self-Adaptive Robust Attention
Isabel Leal
Krzysztof Choromanski
Deepali Jain
Kumar Avinava Dubey
Jake Varley
...
Q. Vuong
Tamás Sarlós
Kenneth Oslund
Karol Hausman
Kanishka Rao
36
8
0
04 Dec 2023
Mitigating Over-smoothing in Transformers via Regularized Nonlocal Functionals
Tam Nguyen
Tan-Minh Nguyen
Richard G. Baraniuk
21
8
0
01 Dec 2023
Efficient Rotation Invariance in Deep Neural Networks through Artificial Mental Rotation
Lukas Tuggener
Thilo Stadelmann
Jürgen Schmidhuber
OOD
13
1
0
14 Nov 2023
Hierarchically Gated Recurrent Neural Network for Sequence Modeling
Zhen Qin
Songlin Yang
Yiran Zhong
36
74
0
08 Nov 2023
p-Laplacian Transformer
Tuan Nguyen
Tam Nguyen
Vinh-Tiep Nguyen
Tan-Minh Nguyen
79
0
0
06 Nov 2023
Simplifying Transformer Blocks
Bobby He
Thomas Hofmann
25
30
0
03 Nov 2023
Practical Computational Power of Linear Transformers and Their Recurrent and Self-Referential Extensions
Kazuki Irie
Róbert Csordás
Jürgen Schmidhuber
28
11
0
24 Oct 2023
Learning to (Learn at Test Time)
Yu Sun
Xinhao Li
Karan Dalal
Chloe Hsu
Oluwasanmi Koyejo
Carlos Guestrin
Xiaolong Wang
Tatsunori Hashimoto
Xinlei Chen
SSL
30
6
0
20 Oct 2023
Approximating Two-Layer Feedforward Networks for Efficient Transformers
Róbert Csordás
Kazuki Irie
Jürgen Schmidhuber
MoE
22
18
0
16 Oct 2023
Do pretrained Transformers Learn In-Context by Gradient Descent?
Lingfeng Shen
Aayush Mishra
Daniel Khashabi
27
7
0
12 Oct 2023
Pit One Against Many: Leveraging Attention-head Embeddings for Parameter-efficient Multi-head Attention
Huiyin Xue
Nikolaos Aletras
23
0
0
11 Oct 2023
Reinforcement Learning with Fast and Forgetful Memory
Steven D. Morad
Ryan Kortvelesy
Stephan Liwicki
Amanda Prorok
OffRL
11
4
0
06 Oct 2023
Scaling Laws for Associative Memories
Vivien A. Cabannes
Elvis Dohmatob
A. Bietti
11
19
0
04 Oct 2023
The Languini Kitchen: Enabling Language Modelling Research at Different Scales of Compute
Aleksandar Stanić
Dylan R. Ashley
Oleg Serikov
Louis Kirsch
Francesco Faccio
Jürgen Schmidhuber
Thomas Hofmann
Imanol Schlag
MoE
38
9
0
20 Sep 2023
Uncovering mesa-optimization algorithms in Transformers
J. Oswald
Eyvind Niklasson
Maximilian Schlegel
Seijin Kobayashi
Nicolas Zucchet
...
Mark Sandler
Blaise Agüera y Arcas
Max Vladymyrov
Razvan Pascanu
João Sacramento
24
53
0
11 Sep 2023
Gated recurrent neural networks discover attention
Nicolas Zucchet
Seijin Kobayashi
Yassir Akram
J. Oswald
Maxime Larcher
Angelika Steger
João Sacramento
31
8
0
04 Sep 2023
Recurrent Attention Networks for Long-text Modeling
Xianming Li
Zongxi Li
Xiaotian Luo
Haoran Xie
Xing Lee
Yingbin Zhao
Fu Lee Wang
Qing Li
RALM
28
15
0
12 Jun 2023
Birth of a Transformer: A Memory Viewpoint
A. Bietti
Vivien A. Cabannes
Diane Bouchacourt
Hervé Jégou
Léon Bottou
21
81
0
01 Jun 2023
Transformers learn to implement preconditioned gradient descent for in-context learning
Kwangjun Ahn
Xiang Cheng
Hadi Daneshmand
S. Sra
ODL
17
147
0
01 Jun 2023
Exploring the Promise and Limits of Real-Time Recurrent Learning
Kazuki Irie
Anand Gopalakrishnan
Jürgen Schmidhuber
19
15
0
30 May 2023
Dynamic Context Pruning for Efficient and Interpretable Autoregressive Transformers
Sotiris Anagnostidis
Dario Pavllo
Luca Biggio
Lorenzo Noci
Aurélien Lucchi
Thomas Hofmann
34
53
0
25 May 2023
Contrastive Training of Complex-Valued Autoencoders for Object Discovery
Aleksandar Stanić
Anand Gopalakrishnan
Kazuki Irie
Jürgen Schmidhuber
OCL
28
14
0
24 May 2023
Brain-inspired learning in artificial neural networks: a review
Samuel Schmidgall
Jascha Achterberg
Thomas Miconi
Louis Kirsch
Rojin Ziaei
S. P. Hajiseyedrazi
Jason Eshraghian
28
52
0
18 May 2023
MEGABYTE: Predicting Million-byte Sequences with Multiscale Transformers
L. Yu
Daniel Simig
Colin Flaherty
Armen Aghajanyan
Luke Zettlemoyer
M. Lewis
21
84
0
12 May 2023
ChatGPT-Like Large-Scale Foundation Models for Prognostics and Health Management: A Survey and Roadmaps
Yanfang Li
Huan Wang
Muxia Sun
LM&MA
AI4TS
AI4CE
27
46
0
10 May 2023
Previous
1
2
3
4
Next