ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2102.11174
  4. Cited By
Linear Transformers Are Secretly Fast Weight Programmers

Linear Transformers Are Secretly Fast Weight Programmers

22 February 2021
Imanol Schlag
Kazuki Irie
Jürgen Schmidhuber
ArXivPDFHTML

Papers citing "Linear Transformers Are Secretly Fast Weight Programmers"

12 / 162 papers shown
Title
Hybrid Random Features
Hybrid Random Features
K. Choromanski
Haoxian Chen
Han Lin
Yuanzhe Ma
Arijit Sehanobish
...
Andy Zeng
Valerii Likhosherstov
Dmitry Kalashnikov
Vikas Sindhwani
Adrian Weller
20
21
0
08 Oct 2021
ABC: Attention with Bounded-memory Control
ABC: Attention with Bounded-memory Control
Hao Peng
Jungo Kasai
Nikolaos Pappas
Dani Yogatama
Zhaofeng Wu
Lingpeng Kong
Roy Schwartz
Noah A. Smith
61
22
0
06 Oct 2021
Ripple Attention for Visual Perception with Sub-quadratic Complexity
Ripple Attention for Visual Perception with Sub-quadratic Complexity
Lin Zheng
Huijie Pan
Lingpeng Kong
23
3
0
06 Oct 2021
Learning with Holographic Reduced Representations
Learning with Holographic Reduced Representations
Ashwinkumar Ganesan
Hang Gao
S. Gandhi
Edward Raff
Tim Oates
James Holt
Mark McLean
11
23
0
05 Sep 2021
The Devil is in the Detail: Simple Tricks Improve Systematic
  Generalization of Transformers
The Devil is in the Detail: Simple Tricks Improve Systematic Generalization of Transformers
Róbert Csordás
Kazuki Irie
Jürgen Schmidhuber
ViT
23
128
0
26 Aug 2021
Going Beyond Linear Transformers with Recurrent Fast Weight Programmers
Going Beyond Linear Transformers with Recurrent Fast Weight Programmers
Kazuki Irie
Imanol Schlag
Róbert Csordás
Jürgen Schmidhuber
26
58
0
11 Jun 2021
Staircase Attention for Recurrent Processing of Sequences
Staircase Attention for Recurrent Processing of Sequences
Da Ju
Stephen Roller
Sainbayar Sukhbaatar
Jason Weston
18
11
0
08 Jun 2021
Choose a Transformer: Fourier or Galerkin
Choose a Transformer: Fourier or Galerkin
Shuhao Cao
39
220
0
31 May 2021
LambdaNetworks: Modeling Long-Range Interactions Without Attention
LambdaNetworks: Modeling Long-Range Interactions Without Attention
Irwan Bello
269
179
0
17 Feb 2021
Meta Learning Backpropagation And Improving It
Meta Learning Backpropagation And Improving It
Louis Kirsch
Jürgen Schmidhuber
45
56
0
29 Dec 2020
On the Binding Problem in Artificial Neural Networks
On the Binding Problem in Artificial Neural Networks
Klaus Greff
Sjoerd van Steenkiste
Jürgen Schmidhuber
OCL
224
254
0
09 Dec 2020
A Decomposable Attention Model for Natural Language Inference
A Decomposable Attention Model for Natural Language Inference
Ankur P. Parikh
Oscar Täckström
Dipanjan Das
Jakob Uszkoreit
201
1,367
0
06 Jun 2016
Previous
1234