Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2102.11174
Cited By
Linear Transformers Are Secretly Fast Weight Programmers
22 February 2021
Imanol Schlag
Kazuki Irie
Jürgen Schmidhuber
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Linear Transformers Are Secretly Fast Weight Programmers"
50 / 162 papers shown
Title
Mixture of Sparse Attention: Content-Based Learnable Sparse Attention via Expert-Choice Routing
Piotr Piekos
Róbert Csordás
Jürgen Schmidhuber
MoE
VLM
96
1
0
01 May 2025
RWKV-X: A Linear Complexity Hybrid Language Model
Haowen Hou
Zhiyi Huang
Kaifeng Tan
Rongchang Lu
Fei Richard Yu
VLM
78
0
0
30 Apr 2025
Exact Learning Dynamics of In-Context Learning in Linear Transformers and Its Application to Non-Linear Transformers
Nischal Mainali
Lucas Teixeira
24
0
0
17 Apr 2025
It's All Connected: A Journey Through Test-Time Memorization, Attentional Bias, Retention, and Online Optimization
Ali Behrouz
Meisam Razaviyayn
Peilin Zhong
Vahab Mirrokni
38
0
0
17 Apr 2025
Bidirectional Linear Recurrent Models for Sequence-Level Multisource Fusion
Qisai Liu
Zhanhong Jiang
Joshua R. Waite
Chao Liu
Aditya Balu
S. Sarkar
AI4TS
24
0
0
11 Apr 2025
One-Minute Video Generation with Test-Time Training
Karan Dalal
Daniel Koceja
Gashon Hussein
Jiarui Xu
Yue Zhao
...
Tatsunori Hashimoto
Sanmi Koyejo
Yejin Choi
Yu Sun
Xiaolong Wang
ViT
91
3
0
07 Apr 2025
Decoding Recommendation Behaviors of In-Context Learning LLMs Through Gradient Descent
Yi Xu
Weicong Qin
Weijie Yu
Ming He
Jianping Fan
Jun Xu
26
0
0
06 Apr 2025
ParallelFlow: Parallelizing Linear Transformers via Flow Discretization
Nicola Muca Cirone
C. Salvi
52
1
0
01 Apr 2025
Tiled Flash Linear Attention: More Efficient Linear RNN and xLSTM Kernels
M. Beck
Korbinian Poppel
Phillip Lippe
Sepp Hochreiter
61
1
0
18 Mar 2025
Long-VMNet: Accelerating Long-Form Video Understanding via Fixed Memory
Saket Gurukar
Asim Kadav
VLM
50
0
0
17 Mar 2025
Measuring In-Context Computation Complexity via Hidden State Prediction
Vincent Herrmann
Róbert Csordás
Jürgen Schmidhuber
41
0
0
17 Mar 2025
Fixed-Point RNNs: From Diagonal to Dense in a Few Iterations
Sajad Movahedi
Felix Sarnthein
Nicola Muca Cirone
Antonio Orvieto
48
2
0
13 Mar 2025
Linear-MoE: Linear Sequence Modeling Meets Mixture-of-Experts
Weigao Sun
Disen Lan
Tong Zhu
Xiaoye Qu
Yu-Xi Cheng
MoE
100
1
0
07 Mar 2025
Associative Recurrent Memory Transformer
Ivan Rodkin
Yuri Kuratov
Aydar Bulatov
Mikhail Burtsev
68
2
0
17 Feb 2025
Training Dynamics of In-Context Learning in Linear Attention
Yedi Zhang
Aaditya K. Singh
Peter E. Latham
Andrew Saxe
MLT
64
1
0
28 Jan 2025
Tensor Product Attention Is All You Need
Yifan Zhang
Yifeng Liu
Huizhuo Yuan
Zhen Qin
Yang Yuan
Q. Gu
Andrew Chi-Chih Yao
77
9
0
11 Jan 2025
Key-value memory in the brain
Samuel J. Gershman
Ila Fiete
Kazuki Irie
34
7
0
06 Jan 2025
Unlocking State-Tracking in Linear RNNs Through Negative Eigenvalues
Riccardo Grazzi
Julien N. Siems
Jörg K.H. Franke
Arber Zela
Frank Hutter
Massimiliano Pontil
92
11
0
19 Nov 2024
MetaLA: Unified Optimal Linear Approximation to Softmax Attention Map
Yuhong Chou
Man Yao
Kexin Wang
Yuqi Pan
Ruijie Zhu
Yiran Zhong
Yu Qiao
J. Wu
Bo Xu
Guoqi Li
46
4
0
16 Nov 2024
StreamAdapter: Efficient Test Time Adaptation from Contextual Streams
Dilxat Muhtar
Yelong Shen
Y. Yang
Xiaodong Liu
Yadong Lu
...
Feng Sun
Xueliang Zhang
Jianfeng Gao
Weizhu Chen
Qi Zhang
TTA
62
0
0
14 Nov 2024
Automatic Album Sequencing
Vincent Herrmann
Dylan R. Ashley
Jürgen Schmidhuber
25
0
0
12 Nov 2024
Generative Adapter: Contextualizing Language Models in Parameters with A Single Forward Pass
Tong Chen
Hao Fang
Patrick Xia
Xiaodong Liu
Benjamin Van Durme
Luke Zettlemoyer
Jianfeng Gao
Hao Cheng
KELM
51
2
0
08 Nov 2024
NIMBA: Towards Robust and Principled Processing of Point Clouds With SSMs
Nursena Köprücü
Destiny Okpekpe
Antonio Orvieto
Mamba
36
1
0
31 Oct 2024
A Walsh Hadamard Derived Linear Vector Symbolic Architecture
Mohammad Mahmudul Alam
Alexander Oberle
Edward Raff
Stella Biderman
Tim Oates
James Holt
LLMSV
31
0
0
30 Oct 2024
On the Role of Depth and Looping for In-Context Learning with Task Diversity
Khashayar Gatmiry
Nikunj Saunshi
Sashank J. Reddi
Stefanie Jegelka
Sanjiv Kumar
29
2
0
29 Oct 2024
FACTS: A Factored State-Space Framework For World Modelling
Li Nanbo
Firas Laakom
Yucheng Xu
Wenyi Wang
Jürgen Schmidhuber
AI4TS
136
0
0
28 Oct 2024
Graph Transformers Dream of Electric Flow
Xiang Cheng
Lawrence Carin
S. Sra
31
0
0
22 Oct 2024
Quadratic Gating Functions in Mixture of Experts: A Statistical Insight
Pedram Akbarian
Huy Le Nguyen
Xing Han
Nhat Ho
MoE
42
0
0
15 Oct 2024
Neural networks that overcome classic challenges through practice
Kazuki Irie
Brenden M. Lake
34
4
0
14 Oct 2024
Stable Hadamard Memory: Revitalizing Memory-Augmented Agents for Reinforcement Learning
H. Le
Kien Do
D. Nguyen
Sunil Gupta
Svetha Venkatesh
32
0
0
14 Oct 2024
Can Looped Transformers Learn to Implement Multi-step Gradient Descent for In-context Learning?
Khashayar Gatmiry
Nikunj Saunshi
Sashank J. Reddi
Stefanie Jegelka
Sanjiv Kumar
67
17
0
10 Oct 2024
DimOL: Dimensional Awareness as A New 'Dimension' in Operator Learning
Yichen Song
Yunbo Wang
Xiaokang Yang
Xiaokang Yang
AI4CE
58
0
0
08 Oct 2024
xLSTM-FER: Enhancing Student Expression Recognition with Extended Vision Long Short-Term Memory Network
Qionghao Huang
Jili Chen
21
1
0
07 Oct 2024
S7: Selective and Simplified State Space Layers for Sequence Modeling
Taylan Soydan
Nikola Zubić
Nico Messikommer
Siddhartha Mishra
Davide Scaramuzza
35
4
0
04 Oct 2024
Selective Attention Improves Transformer
Yaniv Leviathan
Matan Kalman
Yossi Matias
49
8
0
03 Oct 2024
Gated Slot Attention for Efficient Linear-Time Sequence Modeling
Yu Zhang
Songlin Yang
Ruijie Zhu
Yue Zhang
Leyang Cui
...
Freda Shi
Bailin Wang
Wei Bi
P. Zhou
Guohong Fu
60
16
0
11 Sep 2024
Learning Randomized Algorithms with Transformers
J. Oswald
Seijin Kobayashi
Yassir Akram
Angelika Steger
AAML
40
0
0
20 Aug 2024
Longhorn: State Space Models are Amortized Online Learners
Bo Liu
Rui Wang
Lemeng Wu
Yihao Feng
Peter Stone
Qian Liu
46
10
0
19 Jul 2024
Fine-grained Analysis of In-context Linear Estimation: Data, Architecture, and Beyond
Yingcong Li
A. S. Rawat
Samet Oymak
23
6
0
13 Jul 2024
Learning to (Learn at Test Time): RNNs with Expressive Hidden States
Yu Sun
Xinhao Li
Karan Dalal
Jiarui Xu
Arjun Vikram
...
Xinlei Chen
Xiaolong Wang
Sanmi Koyejo
Tatsunori Hashimoto
Carlos Guestrin
58
93
0
05 Jul 2024
On the Anatomy of Attention
Nikhil Khatri
Tuomas Laakkonen
Jonathon Liu
Vincent Wang-Ma'scianica
3DV
48
1
0
02 Jul 2024
Unveiling the Hidden Structure of Self-Attention via Kernel Principal Component Analysis
R. Teo
Tan M. Nguyen
43
4
0
19 Jun 2024
Breaking the Attention Bottleneck
Kalle Hilsenbek
81
0
0
16 Jun 2024
BABILong: Testing the Limits of LLMs with Long Context Reasoning-in-a-Haystack
Yuri Kuratov
Aydar Bulatov
Petr Anokhin
Ivan Rodkin
Dmitry Sorokin
Artyom Sorokin
Mikhail Burtsev
RALM
ALM
LRM
ReLM
ELM
46
58
0
14 Jun 2024
Discrete Dictionary-based Decomposition Layer for Structured Representation Learning
Taewon Park
Hyun-Chul Kim
Minho Lee
41
0
0
11 Jun 2024
Samba: Simple Hybrid State Space Models for Efficient Unlimited Context Language Modeling
Liliang Ren
Yang Liu
Yadong Lu
Yelong Shen
Chen Liang
Weizhu Chen
Mamba
68
55
0
11 Jun 2024
Attention as a Hypernetwork
Simon Schug
Seijin Kobayashi
Yassir Akram
João Sacramento
Razvan Pascanu
GNN
37
3
0
09 Jun 2024
Chimera: Effectively Modeling Multivariate Time Series with 2-Dimensional State Space Models
Ali Behrouz
Michele Santacatterina
Ramin Zabih
Mamba
AI4TS
45
4
0
06 Jun 2024
Attention-based Iterative Decomposition for Tensor Product Representation
Taewon Park
Inchul Choi
Minho Lee
26
1
0
03 Jun 2024
Why Larger Language Models Do In-context Learning Differently?
Zhenmei Shi
Junyi Wei
Zhuoyan Xu
Yingyu Liang
37
18
0
30 May 2024
1
2
3
4
Next