ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2306.00802
  4. Cited By
Birth of a Transformer: A Memory Viewpoint

Birth of a Transformer: A Memory Viewpoint

1 June 2023
A. Bietti
Vivien A. Cabannes
Diane Bouchacourt
Hervé Jégou
Léon Bottou
ArXivPDFHTML

Papers citing "Birth of a Transformer: A Memory Viewpoint"

17 / 67 papers shown
Title
Towards Understanding the Word Sensitivity of Attention Layers: A Study
  via Random Features
Towards Understanding the Word Sensitivity of Attention Layers: A Study via Random Features
Simone Bombari
Marco Mondelli
36
3
0
05 Feb 2024
Self-attention Networks Localize When QK-eigenspectrum Concentrates
Self-attention Networks Localize When QK-eigenspectrum Concentrates
Han Bao
Ryuichiro Hataya
Ryo Karakida
16
5
0
03 Feb 2024
In-Context Learning Dynamics with Random Binary Sequences
In-Context Learning Dynamics with Random Binary Sequences
Eric J. Bigelow
Ekdeep Singh Lubana
Robert P. Dick
Hidenori Tanaka
T. Ullman
29
4
0
26 Oct 2023
On the Optimization and Generalization of Multi-head Attention
On the Optimization and Generalization of Multi-head Attention
Puneesh Deora
Rouzbeh Ghaderi
Hossein Taheri
Christos Thrampoulidis
MLT
47
33
0
19 Oct 2023
How Do Transformers Learn In-Context Beyond Simple Functions? A Case
  Study on Learning with Representations
How Do Transformers Learn In-Context Beyond Simple Functions? A Case Study on Learning with Representations
Tianyu Guo
Wei Hu
Song Mei
Huan Wang
Caiming Xiong
Silvio Savarese
Yu Bai
27
47
0
16 Oct 2023
Scaling Laws for Associative Memories
Scaling Laws for Associative Memories
Vivien A. Cabannes
Elvis Dohmatob
A. Bietti
21
19
0
04 Oct 2023
JoMA: Demystifying Multilayer Transformers via JOint Dynamics of MLP and
  Attention
JoMA: Demystifying Multilayer Transformers via JOint Dynamics of MLP and Attention
Yuandong Tian
Yiping Wang
Zhenyu (Allen) Zhang
Beidi Chen
Simon S. Du
29
35
0
01 Oct 2023
Breaking through the learning plateaus of in-context learning in
  Transformer
Breaking through the learning plateaus of in-context learning in Transformer
Jingwen Fu
Tao Yang
Yuwang Wang
Yan Lu
Nanning Zheng
30
1
0
12 Sep 2023
Towards Vision-Language Mechanistic Interpretability: A Causal Tracing
  Tool for BLIP
Towards Vision-Language Mechanistic Interpretability: A Causal Tracing Tool for BLIP
Vedant Palit
Rohan Pandey
Aryaman Arora
Paul Pu Liang
34
20
0
27 Aug 2023
Bidirectional Attention as a Mixture of Continuous Word Experts
Bidirectional Attention as a Mixture of Continuous Word Experts
Kevin Christian Wibisono
Yixin Wang
MoE
13
0
0
08 Jul 2023
Dissecting Recall of Factual Associations in Auto-Regressive Language
  Models
Dissecting Recall of Factual Associations in Auto-Regressive Language Models
Mor Geva
Jasmijn Bastings
Katja Filippova
Amir Globerson
KELM
191
261
0
28 Apr 2023
How Do Transformers Learn Topic Structure: Towards a Mechanistic
  Understanding
How Do Transformers Learn Topic Structure: Towards a Mechanistic Understanding
Yuchen Li
Yuan-Fang Li
Andrej Risteski
120
61
0
07 Mar 2023
A Survey on In-context Learning
A Survey on In-context Learning
Qingxiu Dong
Lei Li
Damai Dai
Ce Zheng
Jingyuan Ma
...
Zhiyong Wu
Baobao Chang
Xu Sun
Lei Li
Zhifang Sui
ReLM
AIMat
20
462
0
31 Dec 2022
Interpretability in the Wild: a Circuit for Indirect Object
  Identification in GPT-2 small
Interpretability in the Wild: a Circuit for Indirect Object Identification in GPT-2 small
Kevin Wang
Alexandre Variengien
Arthur Conmy
Buck Shlegeris
Jacob Steinhardt
212
496
0
01 Nov 2022
In-context Learning and Induction Heads
In-context Learning and Induction Heads
Catherine Olsson
Nelson Elhage
Neel Nanda
Nicholas Joseph
Nova Dassarma
...
Tom B. Brown
Jack Clark
Jared Kaplan
Sam McCandlish
C. Olah
250
460
0
24 Sep 2022
Toy Models of Superposition
Toy Models of Superposition
Nelson Elhage
Tristan Hume
Catherine Olsson
Nicholas Schiefer
T. Henighan
...
Sam McCandlish
Jared Kaplan
Dario Amodei
Martin Wattenberg
C. Olah
AAML
MILM
125
318
0
21 Sep 2022
Efficient Estimation of Word Representations in Vector Space
Efficient Estimation of Word Representations in Vector Space
Tomáš Mikolov
Kai Chen
G. Corrado
J. Dean
3DV
245
31,257
0
16 Jan 2013
Previous
12