Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1907.01470
Cited By
Augmenting Self-attention with Persistent Memory
2 July 2019
Sainbayar Sukhbaatar
Edouard Grave
Guillaume Lample
Hervé Jégou
Armand Joulin
RALM
KELM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Augmenting Self-attention with Persistent Memory"
30 / 30 papers shown
Title
Attention Is Not All You Need: The Importance of Feedforward Networks in Transformer Models
Isaac Gerber
29
0
0
10 May 2025
Human-inspired Perspectives: A Survey on AI Long-term Memory
Zihong He
Weizhe Lin
Hao Zheng
Fan Zhang
Matt Jones
Laurence Aitchison
X. Xu
Miao Liu
Per Ola Kristensson
Junxiao Shen
77
2
0
01 Nov 2024
TokenFormer: Rethinking Transformer Scaling with Tokenized Model Parameters
Haiyang Wang
Yue Fan
Muhammad Ferjad Naeem
Yongqin Xian
J. E. Lenssen
Liwei Wang
F. Tombari
Bernt Schiele
43
2
0
30 Oct 2024
Beyond Scaling Laws: Understanding Transformer Performance with Associative Memory
Xueyan Niu
Bo Bai
Lei Deng
Wei Han
31
6
0
14 May 2024
Memory Mosaics
Jianyu Zhang
Niklas Nolte
Ranajoy Sadhukhan
Beidi Chen
Léon Bottou
VLM
70
3
0
10 May 2024
Goodtriever: Adaptive Toxicity Mitigation with Retrieval-augmented Models
Luiza Amador Pozzobon
B. Ermiş
Patrick Lewis
Sara Hooker
28
20
0
11 Oct 2023
Transformer-VQ: Linear-Time Transformers via Vector Quantization
Albert Mohwald
28
15
0
28 Sep 2023
With a Little Help from your own Past: Prototypical Memory Networks for Image Captioning
Manuele Barraco
Sara Sarto
Marcella Cornia
Lorenzo Baraldi
Rita Cucchiara
VLM
55
19
0
23 Aug 2023
A Study on ReLU and Softmax in Transformer
Kai Shen
Junliang Guo
Xuejiao Tan
Siliang Tang
Rui Wang
Jiang Bian
19
53
0
13 Feb 2023
Interpretability in Activation Space Analysis of Transformers: A Focused Survey
Soniya Vijayakumar
AI4CE
27
3
0
22 Jan 2023
Bird-Eye Transformers for Text Generation Models
Lei Sha
Yuhang Song
Yordan Yordanov
Tommaso Salvatori
Thomas Lukasiewicz
25
0
0
08 Oct 2022
Analyzing Transformers in Embedding Space
Guy Dar
Mor Geva
Ankit Gupta
Jonathan Berant
19
83
0
06 Sep 2022
Momentum Transformer: Closing the Performance Gap Between Self-attention and Its Linearization
T. Nguyen
Richard G. Baraniuk
Robert M. Kirby
Stanley J. Osher
Bao Wang
21
9
0
01 Aug 2022
Exploring the sequence length bottleneck in the Transformer for Image Captioning
Jiapeng Hu
Roberto Cavicchioli
Alessandro Capotondi
ViT
38
3
0
07 Jul 2022
Plug-and-Play Adaptation for Continuously-updated QA
Kyungjae Lee
Wookje Han
Seung-won Hwang
Hwaran Lee
Joonsuk Park
Sang-Woo Lee
KELM
17
16
0
27 Apr 2022
Transformer Feed-Forward Layers Build Predictions by Promoting Concepts in the Vocabulary Space
Mor Geva
Avi Caciularu
Ke Wang
Yoav Goldberg
KELM
46
333
0
28 Mar 2022
ANNA: Enhanced Language Representation for Question Answering
Changwook Jun
Hansol Jang
Myoseop Sim
Hyun Kim
Jooyoung Choi
Kyungkoo Min
Kyunghoon Bae
29
6
0
28 Mar 2022
Memorizing Transformers
Yuhuai Wu
M. Rabe
DeLesley S. Hutchins
Christian Szegedy
RALM
16
172
0
16 Mar 2022
CaMEL: Mean Teacher Learning for Image Captioning
Manuele Barraco
Matteo Stefanini
Marcella Cornia
S. Cascianelli
Lorenzo Baraldi
Rita Cucchiara
ViT
VLM
31
27
0
21 Feb 2022
The Dual Form of Neural Networks Revisited: Connecting Test Time Predictions to Training Patterns via Spotlights of Attention
Kazuki Irie
Róbert Csordás
Jürgen Schmidhuber
14
42
0
11 Feb 2022
Tip-Adapter: Training-free CLIP-Adapter for Better Vision-Language Modeling
Renrui Zhang
Rongyao Fang
Wei Zhang
Peng Gao
Kunchang Li
Jifeng Dai
Yu Qiao
Hongsheng Li
VLM
189
385
0
06 Nov 2021
Combining Transformers with Natural Language Explanations
Federico Ruggeri
Marco Lippi
Paolo Torroni
17
1
0
02 Sep 2021
A Survey of Transformers
Tianyang Lin
Yuxin Wang
Xiangyang Liu
Xipeng Qiu
ViT
32
1,087
0
08 Jun 2021
Mask Attention Networks: Rethinking and Strengthen Transformer
Zhihao Fan
Yeyun Gong
Dayiheng Liu
Zhongyu Wei
Siyuan Wang
Jian Jiao
Nan Duan
Ruofei Zhang
Xuanjing Huang
26
72
0
25 Mar 2021
Transformer Feed-Forward Layers Are Key-Value Memories
Mor Geva
R. Schuster
Jonathan Berant
Omer Levy
KELM
22
741
0
29 Dec 2020
Efficient Transformers: A Survey
Yi Tay
Mostafa Dehghani
Dara Bahri
Donald Metzler
VLM
82
1,101
0
14 Sep 2020
Simplified Self-Attention for Transformer-based End-to-End Speech Recognition
Haoneng Luo
Shiliang Zhang
Ming Lei
Lei Xie
27
33
0
21 May 2020
Meshed-Memory Transformer for Image Captioning
Marcella Cornia
Matteo Stefanini
Lorenzo Baraldi
Rita Cucchiara
14
868
0
17 Dec 2019
DFSMN-SAN with Persistent Memory Model for Automatic Speech Recognition
Zhao You
Dan Su
Jie Chen
Chao Weng
Dong Yu
28
13
0
28 Oct 2019
Transformers without Tears: Improving the Normalization of Self-Attention
Toan Q. Nguyen
Julian Salazar
36
224
0
14 Oct 2019
1