ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2306.02010
  4. Cited By
Memorization Capacity of Multi-Head Attention in Transformers

Memorization Capacity of Multi-Head Attention in Transformers

3 June 2023
Sadegh Mahdavi
Renjie Liao
Christos Thrampoulidis
ArXivPDFHTML

Papers citing "Memorization Capacity of Multi-Head Attention in Transformers"

15 / 15 papers shown
Title
Transformers Can Overcome the Curse of Dimensionality: A Theoretical Study from an Approximation Perspective
Transformers Can Overcome the Curse of Dimensionality: A Theoretical Study from an Approximation Perspective
Yuling Jiao
Yanming Lai
Yang Wang
Bokai Yan
34
0
0
18 Apr 2025
Taming Knowledge Conflicts in Language Models
Gaotang Li
Yuzhong Chen
Hanghang Tong
KELM
44
0
0
14 Mar 2025
Mixture of Parrots: Experts improve memorization more than reasoning
Mixture of Parrots: Experts improve memorization more than reasoning
Samy Jelassi
Clara Mohri
David Brandfonbrener
Alex Gu
Nikhil Vyas
Nikhil Anand
David Alvarez-Melis
Yuanzhi Li
Sham Kakade
Eran Malach
MoE
28
4
0
24 Oct 2024
Undesirable Memorization in Large Language Models: A Survey
Undesirable Memorization in Large Language Models: A Survey
Ali Satvaty
Suzan Verberne
Fatih Turkmen
ELM
PILM
69
7
0
03 Oct 2024
How Transformers Utilize Multi-Head Attention in In-Context Learning? A
  Case Study on Sparse Linear Regression
How Transformers Utilize Multi-Head Attention in In-Context Learning? A Case Study on Sparse Linear Regression
Xingwu Chen
Lei Zhao
Difan Zou
36
6
0
08 Aug 2024
Empirical Capacity Model for Self-Attention Neural Networks
Empirical Capacity Model for Self-Attention Neural Networks
Aki Härmä
M. Pietrasik
Anna Wilbik
27
1
0
22 Jul 2024
Do LLMs dream of elephants (when told not to)? Latent concept
  association and associative memory in transformers
Do LLMs dream of elephants (when told not to)? Latent concept association and associative memory in transformers
Yibo Jiang
Goutham Rajendran
Pradeep Ravikumar
Bryon Aragam
CLL
KELM
29
6
0
26 Jun 2024
What Can Transformer Learn with Varying Depth? Case Studies on Sequence
  Learning Tasks
What Can Transformer Learn with Varying Depth? Case Studies on Sequence Learning Tasks
Xingwu Chen
Difan Zou
ViT
24
12
0
02 Apr 2024
Prompting a Pretrained Transformer Can Be a Universal Approximator
Prompting a Pretrained Transformer Can Be a Universal Approximator
Aleksandar Petrov
Philip H. S. Torr
Adel Bibi
19
11
0
22 Feb 2024
Implicit Bias and Fast Convergence Rates for Self-attention
Implicit Bias and Fast Convergence Rates for Self-attention
Bhavya Vasudeva
Puneesh Deora
Christos Thrampoulidis
24
13
0
08 Feb 2024
Superiority of Multi-Head Attention in In-Context Linear Regression
Superiority of Multi-Head Attention in In-Context Linear Regression
Yingqian Cui
Jie Ren
Pengfei He
Jiliang Tang
Yue Xing
34
12
0
30 Jan 2024
On the Optimization and Generalization of Multi-head Attention
On the Optimization and Generalization of Multi-head Attention
Puneesh Deora
Rouzbeh Ghaderi
Hossein Taheri
Christos Thrampoulidis
MLT
34
33
0
19 Oct 2023
Are Transformers with One Layer Self-Attention Using Low-Rank Weight
  Matrices Universal Approximators?
Are Transformers with One Layer Self-Attention Using Low-Rank Weight Matrices Universal Approximators?
T. Kajitsuka
Issei Sato
29
16
0
26 Jul 2023
Your Transformer May Not be as Powerful as You Expect
Your Transformer May Not be as Powerful as You Expect
Shengjie Luo
Shanda Li
Shuxin Zheng
Tie-Yan Liu
Liwei Wang
Di He
52
50
0
26 May 2022
Extracting Training Data from Large Language Models
Extracting Training Data from Large Language Models
Nicholas Carlini
Florian Tramèr
Eric Wallace
Matthew Jagielski
Ariel Herbert-Voss
...
Tom B. Brown
D. Song
Ulfar Erlingsson
Alina Oprea
Colin Raffel
MLAU
SILM
267
1,808
0
14 Dec 2020
1