ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2012.14913
  4. Cited By
Transformer Feed-Forward Layers Are Key-Value Memories

Transformer Feed-Forward Layers Are Key-Value Memories

29 December 2020
Mor Geva
R. Schuster
Jonathan Berant
Omer Levy
    KELM
ArXivPDFHTML

Papers citing "Transformer Feed-Forward Layers Are Key-Value Memories"

28 / 128 papers shown
Title
Memorization Capacity of Multi-Head Attention in Transformers
Memorization Capacity of Multi-Head Attention in Transformers
Sadegh Mahdavi
Renjie Liao
Christos Thrampoulidis
24
22
0
03 Jun 2023
Explaining How Transformers Use Context to Build Predictions
Explaining How Transformers Use Context to Build Predictions
Javier Ferrando
Gerard I. Gállego
Ioannis Tsiamas
Marta R. Costa-jussá
22
31
0
21 May 2023
Token-wise Decomposition of Autoregressive Language Model Hidden States
  for Analyzing Model Predictions
Token-wise Decomposition of Autoregressive Language Model Hidden States for Analyzing Model Predictions
Byung-Doh Oh
William Schuler
22
2
0
17 May 2023
N2G: A Scalable Approach for Quantifying Interpretable Neuron
  Representations in Large Language Models
N2G: A Scalable Approach for Quantifying Interpretable Neuron Representations in Large Language Models
Alex Foote
Neel Nanda
Esben Kran
Ionnis Konstas
Fazl Barez
MILM
18
2
0
22 Apr 2023
Computational modeling of semantic change
Computational modeling of semantic change
Nina Tahmasebi
Haim Dubossarsky
26
6
0
13 Apr 2023
Factorizers for Distributed Sparse Block Codes
Factorizers for Distributed Sparse Block Codes
Michael Hersche
Aleksandar Terzić
G. Karunaratne
Jovin Langenegger
Angeline Pouget
G. Cherubini
Luca Benini
Abu Sebastian
Abbas Rahimi
37
4
0
24 Mar 2023
LabelPrompt: Effective Prompt-based Learning for Relation Classification
LabelPrompt: Effective Prompt-based Learning for Relation Classification
W. Zhang
Xiaoning Song
Zhenhua Feng
Tianyang Xu
Xiaojun Wu
VLM
27
4
0
16 Feb 2023
Interpretability in Activation Space Analysis of Transformers: A Focused
  Survey
Interpretability in Activation Space Analysis of Transformers: A Focused Survey
Soniya Vijayakumar
AI4CE
27
3
0
22 Jan 2023
Does Localization Inform Editing? Surprising Differences in
  Causality-Based Localization vs. Knowledge Editing in Language Models
Does Localization Inform Editing? Surprising Differences in Causality-Based Localization vs. Knowledge Editing in Language Models
Peter Hase
Mohit Bansal
Been Kim
Asma Ghandeharioun
MILM
34
167
0
10 Jan 2023
Rank-One Editing of Encoder-Decoder Models
Rank-One Editing of Encoder-Decoder Models
Vikas Raunak
Arul Menezes
KELM
21
10
0
23 Nov 2022
Interpreting Neural Networks through the Polytope Lens
Interpreting Neural Networks through the Polytope Lens
Sid Black
Lee D. Sharkey
Léo Grinsztajn
Eric Winsor
Daniel A. Braun
...
Kip Parker
Carlos Ramón Guevara
Beren Millidge
Gabriel Alfour
Connor Leahy
FAtt
MILM
21
22
0
22 Nov 2022
Finding Skill Neurons in Pre-trained Transformer-based Language Models
Finding Skill Neurons in Pre-trained Transformer-based Language Models
Xiaozhi Wang
Kaiyue Wen
Zhengyan Zhang
Lei Hou
Zhiyuan Liu
Juanzi Li
MILM
MoE
19
50
0
14 Nov 2022
Large Language Models with Controllable Working Memory
Large Language Models with Controllable Working Memory
Daliang Li
A. S. Rawat
Manzil Zaheer
Xin Wang
Michal Lukasik
Andreas Veit
Felix X. Yu
Surinder Kumar
KELM
50
151
0
09 Nov 2022
Understanding Transformer Memorization Recall Through Idioms
Understanding Transformer Memorization Recall Through Idioms
Adi Haviv
Ido Cohen
Jacob Gidron
R. Schuster
Yoav Goldberg
Mor Geva
26
48
0
07 Oct 2022
Calibrating Factual Knowledge in Pretrained Language Models
Calibrating Factual Knowledge in Pretrained Language Models
Qingxiu Dong
Damai Dai
Yifan Song
Jingjing Xu
Zhifang Sui
Lei Li
KELM
228
82
0
07 Oct 2022
Analyzing Transformers in Embedding Space
Analyzing Transformers in Embedding Space
Guy Dar
Mor Geva
Ankit Gupta
Jonathan Berant
19
83
0
06 Sep 2022
How to Dissect a Muppet: The Structure of Transformer Embedding Spaces
How to Dissect a Muppet: The Structure of Transformer Embedding Spaces
Timothee Mickus
Denis Paperno
Mathieu Constant
17
19
0
07 Jun 2022
Hybrid Transformer with Multi-level Fusion for Multimodal Knowledge
  Graph Completion
Hybrid Transformer with Multi-level Fusion for Multimodal Knowledge Graph Completion
Xiang Chen
Ningyu Zhang
Lei Li
Shumin Deng
Chuanqi Tan
Changliang Xu
Fei Huang
Luo Si
Huajun Chen
18
126
0
04 May 2022
Plug-and-Play Adaptation for Continuously-updated QA
Plug-and-Play Adaptation for Continuously-updated QA
Kyungjae Lee
Wookje Han
Seung-won Hwang
Hwaran Lee
Joonsuk Park
Sang-Woo Lee
KELM
17
16
0
27 Apr 2022
Monarch: Expressive Structured Matrices for Efficient and Accurate
  Training
Monarch: Expressive Structured Matrices for Efficient and Accurate Training
Tri Dao
Beidi Chen
N. Sohoni
Arjun D Desai
Michael Poli
Jessica Grogan
Alexander Liu
Aniruddh Rao
Atri Rudra
Christopher Ré
22
87
0
01 Apr 2022
Transformer Feed-Forward Layers Build Predictions by Promoting Concepts
  in the Vocabulary Space
Transformer Feed-Forward Layers Build Predictions by Promoting Concepts in the Vocabulary Space
Mor Geva
Avi Caciularu
Ke Wang
Yoav Goldberg
KELM
43
333
0
28 Mar 2022
Locating and Editing Factual Associations in GPT
Locating and Editing Factual Associations in GPT
Kevin Meng
David Bau
A. Andonian
Yonatan Belinkov
KELM
35
1,184
0
10 Feb 2022
Generative Modeling of Complex Data
Generative Modeling of Complex Data
Luca Canale
Nicolas Grislain
Grégoire Lothe
Johanne Leduc
SyDa
8
4
0
04 Feb 2022
Kformer: Knowledge Injection in Transformer Feed-Forward Layers
Kformer: Knowledge Injection in Transformer Feed-Forward Layers
Yunzhi Yao
Shaohan Huang
Li Dong
Furu Wei
Huajun Chen
Ningyu Zhang
KELM
MedIm
21
42
0
15 Jan 2022
On Transferability of Prompt Tuning for Natural Language Processing
On Transferability of Prompt Tuning for Natural Language Processing
Yusheng Su
Xiaozhi Wang
Yujia Qin
Chi-Min Chan
Yankai Lin
...
Peng Li
Juanzi Li
Lei Hou
Maosong Sun
Jie Zhou
AAML
VLM
18
98
0
12 Nov 2021
Towards a Unified View of Parameter-Efficient Transfer Learning
Towards a Unified View of Parameter-Efficient Transfer Learning
Junxian He
Chunting Zhou
Xuezhe Ma
Taylor Berg-Kirkpatrick
Graham Neubig
AAML
21
892
0
08 Oct 2021
MoEfication: Transformer Feed-forward Layers are Mixtures of Experts
MoEfication: Transformer Feed-forward Layers are Mixtures of Experts
Zhengyan Zhang
Yankai Lin
Zhiyuan Liu
Peng Li
Maosong Sun
Jie Zhou
MoE
19
117
0
05 Oct 2021
Exploiting Temporal Contexts with Strided Transformer for 3D Human Pose
  Estimation
Exploiting Temporal Contexts with Strided Transformer for 3D Human Pose Estimation
Wenhao Li
Hong Liu
Runwei Ding
Mengyuan Liu
Pichao Wang
Wenming Yang
ViT
17
189
0
26 Mar 2021
Previous
123