ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2102.11174
  4. Cited By
Linear Transformers Are Secretly Fast Weight Programmers

Linear Transformers Are Secretly Fast Weight Programmers

22 February 2021
Imanol Schlag
Kazuki Irie
Jürgen Schmidhuber
ArXivPDFHTML

Papers citing "Linear Transformers Are Secretly Fast Weight Programmers"

50 / 162 papers shown
Title
Mixture of Sparse Attention: Content-Based Learnable Sparse Attention via Expert-Choice Routing
Mixture of Sparse Attention: Content-Based Learnable Sparse Attention via Expert-Choice Routing
Piotr Piekos
Róbert Csordás
Jürgen Schmidhuber
MoE
VLM
96
1
0
01 May 2025
RWKV-X: A Linear Complexity Hybrid Language Model
RWKV-X: A Linear Complexity Hybrid Language Model
Haowen Hou
Zhiyi Huang
Kaifeng Tan
Rongchang Lu
Fei Richard Yu
VLM
78
0
0
30 Apr 2025
Exact Learning Dynamics of In-Context Learning in Linear Transformers and Its Application to Non-Linear Transformers
Exact Learning Dynamics of In-Context Learning in Linear Transformers and Its Application to Non-Linear Transformers
Nischal Mainali
Lucas Teixeira
24
0
0
17 Apr 2025
It's All Connected: A Journey Through Test-Time Memorization, Attentional Bias, Retention, and Online Optimization
It's All Connected: A Journey Through Test-Time Memorization, Attentional Bias, Retention, and Online Optimization
Ali Behrouz
Meisam Razaviyayn
Peilin Zhong
Vahab Mirrokni
38
0
0
17 Apr 2025
Bidirectional Linear Recurrent Models for Sequence-Level Multisource Fusion
Bidirectional Linear Recurrent Models for Sequence-Level Multisource Fusion
Qisai Liu
Zhanhong Jiang
Joshua R. Waite
Chao Liu
Aditya Balu
S. Sarkar
AI4TS
24
0
0
11 Apr 2025
One-Minute Video Generation with Test-Time Training
One-Minute Video Generation with Test-Time Training
Karan Dalal
Daniel Koceja
Gashon Hussein
Jiarui Xu
Yue Zhao
...
Tatsunori Hashimoto
Sanmi Koyejo
Yejin Choi
Yu Sun
Xiaolong Wang
ViT
91
3
0
07 Apr 2025
Decoding Recommendation Behaviors of In-Context Learning LLMs Through Gradient Descent
Decoding Recommendation Behaviors of In-Context Learning LLMs Through Gradient Descent
Yi Xu
Weicong Qin
Weijie Yu
Ming He
Jianping Fan
Jun Xu
26
0
0
06 Apr 2025
ParallelFlow: Parallelizing Linear Transformers via Flow Discretization
ParallelFlow: Parallelizing Linear Transformers via Flow Discretization
Nicola Muca Cirone
C. Salvi
52
1
0
01 Apr 2025
Tiled Flash Linear Attention: More Efficient Linear RNN and xLSTM Kernels
Tiled Flash Linear Attention: More Efficient Linear RNN and xLSTM Kernels
M. Beck
Korbinian Poppel
Phillip Lippe
Sepp Hochreiter
61
1
0
18 Mar 2025
Long-VMNet: Accelerating Long-Form Video Understanding via Fixed Memory
Long-VMNet: Accelerating Long-Form Video Understanding via Fixed Memory
Saket Gurukar
Asim Kadav
VLM
50
0
0
17 Mar 2025
Measuring In-Context Computation Complexity via Hidden State Prediction
Measuring In-Context Computation Complexity via Hidden State Prediction
Vincent Herrmann
Róbert Csordás
Jürgen Schmidhuber
41
0
0
17 Mar 2025
Fixed-Point RNNs: From Diagonal to Dense in a Few Iterations
Sajad Movahedi
Felix Sarnthein
Nicola Muca Cirone
Antonio Orvieto
46
2
0
13 Mar 2025
Linear-MoE: Linear Sequence Modeling Meets Mixture-of-Experts
Linear-MoE: Linear Sequence Modeling Meets Mixture-of-Experts
Weigao Sun
Disen Lan
Tong Zhu
Xiaoye Qu
Yu-Xi Cheng
MoE
97
1
0
07 Mar 2025
Associative Recurrent Memory Transformer
Associative Recurrent Memory Transformer
Ivan Rodkin
Yuri Kuratov
Aydar Bulatov
Mikhail Burtsev
68
2
0
17 Feb 2025
Training Dynamics of In-Context Learning in Linear Attention
Yedi Zhang
Aaditya K. Singh
Peter E. Latham
Andrew Saxe
MLT
64
1
0
28 Jan 2025
Tensor Product Attention Is All You Need
Tensor Product Attention Is All You Need
Yifan Zhang
Yifeng Liu
Huizhuo Yuan
Zhen Qin
Yang Yuan
Q. Gu
Andrew Chi-Chih Yao
77
9
0
11 Jan 2025
Key-value memory in the brain
Samuel J. Gershman
Ila Fiete
Kazuki Irie
34
7
0
06 Jan 2025
Unlocking State-Tracking in Linear RNNs Through Negative Eigenvalues
Unlocking State-Tracking in Linear RNNs Through Negative Eigenvalues
Riccardo Grazzi
Julien N. Siems
Jörg K.H. Franke
Arber Zela
Frank Hutter
Massimiliano Pontil
92
11
0
19 Nov 2024
MetaLA: Unified Optimal Linear Approximation to Softmax Attention Map
Yuhong Chou
Man Yao
Kexin Wang
Yuqi Pan
Ruijie Zhu
Yiran Zhong
Yu Qiao
J. Wu
Bo Xu
Guoqi Li
46
4
0
16 Nov 2024
StreamAdapter: Efficient Test Time Adaptation from Contextual Streams
StreamAdapter: Efficient Test Time Adaptation from Contextual Streams
Dilxat Muhtar
Yelong Shen
Y. Yang
Xiaodong Liu
Yadong Lu
...
Feng Sun
Xueliang Zhang
Jianfeng Gao
Weizhu Chen
Qi Zhang
TTA
62
0
0
14 Nov 2024
Automatic Album Sequencing
Automatic Album Sequencing
Vincent Herrmann
Dylan R. Ashley
Jürgen Schmidhuber
25
0
0
12 Nov 2024
Generative Adapter: Contextualizing Language Models in Parameters with A
  Single Forward Pass
Generative Adapter: Contextualizing Language Models in Parameters with A Single Forward Pass
Tong Chen
Hao Fang
Patrick Xia
Xiaodong Liu
Benjamin Van Durme
Luke Zettlemoyer
Jianfeng Gao
Hao Cheng
KELM
51
2
0
08 Nov 2024
NIMBA: Towards Robust and Principled Processing of Point Clouds With
  SSMs
NIMBA: Towards Robust and Principled Processing of Point Clouds With SSMs
Nursena Köprücü
Destiny Okpekpe
Antonio Orvieto
Mamba
36
1
0
31 Oct 2024
A Walsh Hadamard Derived Linear Vector Symbolic Architecture
A Walsh Hadamard Derived Linear Vector Symbolic Architecture
Mohammad Mahmudul Alam
Alexander Oberle
Edward Raff
Stella Biderman
Tim Oates
James Holt
LLMSV
31
0
0
30 Oct 2024
On the Role of Depth and Looping for In-Context Learning with Task
  Diversity
On the Role of Depth and Looping for In-Context Learning with Task Diversity
Khashayar Gatmiry
Nikunj Saunshi
Sashank J. Reddi
Stefanie Jegelka
Sanjiv Kumar
29
2
0
29 Oct 2024
FACTS: A Factored State-Space Framework For World Modelling
FACTS: A Factored State-Space Framework For World Modelling
Li Nanbo
Firas Laakom
Yucheng Xu
Wenyi Wang
Jürgen Schmidhuber
AI4TS
134
0
0
28 Oct 2024
Graph Transformers Dream of Electric Flow
Graph Transformers Dream of Electric Flow
Xiang Cheng
Lawrence Carin
S. Sra
31
0
0
22 Oct 2024
Quadratic Gating Functions in Mixture of Experts: A Statistical Insight
Quadratic Gating Functions in Mixture of Experts: A Statistical Insight
Pedram Akbarian
Huy Le Nguyen
Xing Han
Nhat Ho
MoE
42
0
0
15 Oct 2024
Neural networks that overcome classic challenges through practice
Neural networks that overcome classic challenges through practice
Kazuki Irie
Brenden M. Lake
31
4
0
14 Oct 2024
Stable Hadamard Memory: Revitalizing Memory-Augmented Agents for
  Reinforcement Learning
Stable Hadamard Memory: Revitalizing Memory-Augmented Agents for Reinforcement Learning
H. Le
Kien Do
D. Nguyen
Sunil Gupta
Svetha Venkatesh
30
0
0
14 Oct 2024
Can Looped Transformers Learn to Implement Multi-step Gradient Descent
  for In-context Learning?
Can Looped Transformers Learn to Implement Multi-step Gradient Descent for In-context Learning?
Khashayar Gatmiry
Nikunj Saunshi
Sashank J. Reddi
Stefanie Jegelka
Sanjiv Kumar
67
17
0
10 Oct 2024
DimOL: Dimensional Awareness as A New 'Dimension' in Operator Learning
DimOL: Dimensional Awareness as A New 'Dimension' in Operator Learning
Yichen Song
Yunbo Wang
Xiaokang Yang
Xiaokang Yang
AI4CE
58
0
0
08 Oct 2024
xLSTM-FER: Enhancing Student Expression Recognition with Extended Vision
  Long Short-Term Memory Network
xLSTM-FER: Enhancing Student Expression Recognition with Extended Vision Long Short-Term Memory Network
Qionghao Huang
Jili Chen
19
1
0
07 Oct 2024
S7: Selective and Simplified State Space Layers for Sequence Modeling
S7: Selective and Simplified State Space Layers for Sequence Modeling
Taylan Soydan
Nikola Zubić
Nico Messikommer
Siddhartha Mishra
Davide Scaramuzza
35
4
0
04 Oct 2024
Selective Attention Improves Transformer
Selective Attention Improves Transformer
Yaniv Leviathan
Matan Kalman
Yossi Matias
49
8
0
03 Oct 2024
Gated Slot Attention for Efficient Linear-Time Sequence Modeling
Gated Slot Attention for Efficient Linear-Time Sequence Modeling
Yu Zhang
Songlin Yang
Ruijie Zhu
Yue Zhang
Leyang Cui
...
Freda Shi
Bailin Wang
Wei Bi
P. Zhou
Guohong Fu
60
16
0
11 Sep 2024
Learning Randomized Algorithms with Transformers
Learning Randomized Algorithms with Transformers
J. Oswald
Seijin Kobayashi
Yassir Akram
Angelika Steger
AAML
40
0
0
20 Aug 2024
Longhorn: State Space Models are Amortized Online Learners
Longhorn: State Space Models are Amortized Online Learners
Bo Liu
Rui Wang
Lemeng Wu
Yihao Feng
Peter Stone
Qian Liu
46
10
0
19 Jul 2024
Fine-grained Analysis of In-context Linear Estimation: Data,
  Architecture, and Beyond
Fine-grained Analysis of In-context Linear Estimation: Data, Architecture, and Beyond
Yingcong Li
A. S. Rawat
Samet Oymak
23
6
0
13 Jul 2024
Learning to (Learn at Test Time): RNNs with Expressive Hidden States
Learning to (Learn at Test Time): RNNs with Expressive Hidden States
Yu Sun
Xinhao Li
Karan Dalal
Jiarui Xu
Arjun Vikram
...
Xinlei Chen
Xiaolong Wang
Sanmi Koyejo
Tatsunori Hashimoto
Carlos Guestrin
58
93
0
05 Jul 2024
On the Anatomy of Attention
On the Anatomy of Attention
Nikhil Khatri
Tuomas Laakkonen
Jonathon Liu
Vincent Wang-Ma'scianica
3DV
48
1
0
02 Jul 2024
Unveiling the Hidden Structure of Self-Attention via Kernel Principal
  Component Analysis
Unveiling the Hidden Structure of Self-Attention via Kernel Principal Component Analysis
R. Teo
Tan M. Nguyen
43
4
0
19 Jun 2024
Breaking the Attention Bottleneck
Breaking the Attention Bottleneck
Kalle Hilsenbek
81
0
0
16 Jun 2024
BABILong: Testing the Limits of LLMs with Long Context
  Reasoning-in-a-Haystack
BABILong: Testing the Limits of LLMs with Long Context Reasoning-in-a-Haystack
Yuri Kuratov
Aydar Bulatov
Petr Anokhin
Ivan Rodkin
Dmitry Sorokin
Artyom Sorokin
Mikhail Burtsev
RALM
ALM
LRM
ReLM
ELM
44
58
0
14 Jun 2024
Discrete Dictionary-based Decomposition Layer for Structured
  Representation Learning
Discrete Dictionary-based Decomposition Layer for Structured Representation Learning
Taewon Park
Hyun-Chul Kim
Minho Lee
38
0
0
11 Jun 2024
Samba: Simple Hybrid State Space Models for Efficient Unlimited Context Language Modeling
Samba: Simple Hybrid State Space Models for Efficient Unlimited Context Language Modeling
Liliang Ren
Yang Liu
Yadong Lu
Yelong Shen
Chen Liang
Weizhu Chen
Mamba
66
55
0
11 Jun 2024
Attention as a Hypernetwork
Attention as a Hypernetwork
Simon Schug
Seijin Kobayashi
Yassir Akram
João Sacramento
Razvan Pascanu
GNN
37
3
0
09 Jun 2024
Chimera: Effectively Modeling Multivariate Time Series with
  2-Dimensional State Space Models
Chimera: Effectively Modeling Multivariate Time Series with 2-Dimensional State Space Models
Ali Behrouz
Michele Santacatterina
Ramin Zabih
Mamba
AI4TS
42
4
0
06 Jun 2024
Attention-based Iterative Decomposition for Tensor Product
  Representation
Attention-based Iterative Decomposition for Tensor Product Representation
Taewon Park
Inchul Choi
Minho Lee
26
1
0
03 Jun 2024
Why Larger Language Models Do In-context Learning Differently?
Why Larger Language Models Do In-context Learning Differently?
Zhenmei Shi
Junyi Wei
Zhuoyan Xu
Yingyu Liang
37
18
0
30 May 2024
1234
Next