Birth of a Transformer: A Memory Viewpoint

Birth of a Transformer: A Memory Viewpoint

1 June 2023

Vivien A. Cabannes

Diane Bouchacourt

Papers citing "Birth of a Transformer: A Memory Viewpoint"

17 / 67 papers shown

Title
Towards Understanding the Word Sensitivity of Attention Layers: A Study via Random Features Simone Bombari Marco Mondelli 36 3 0 05 Feb 2024
Self-attention Networks Localize When QK-eigenspectrum Concentrates Han Bao Ryuichiro Hataya Ryo Karakida 16 5 0 03 Feb 2024
In-Context Learning Dynamics with Random Binary Sequences Eric J. Bigelow Ekdeep Singh Lubana Robert P. Dick Hidenori Tanaka T. Ullman 29 4 0 26 Oct 2023
On the Optimization and Generalization of Multi-head Attention Puneesh Deora Rouzbeh Ghaderi Hossein Taheri Christos Thrampoulidis MLT 47 33 0 19 Oct 2023
How Do Transformers Learn In-Context Beyond Simple Functions? A Case Study on Learning with Representations Tianyu Guo Wei Hu Song Mei Huan Wang Caiming Xiong Silvio Savarese Yu Bai 27 47 0 16 Oct 2023
Scaling Laws for Associative Memories Vivien A. Cabannes Elvis Dohmatob A. Bietti 21 19 0 04 Oct 2023
JoMA: Demystifying Multilayer Transformers via JOint Dynamics of MLP and Attention Yuandong Tian Yiping Wang Zhenyu (Allen) Zhang Beidi Chen Simon S. Du 29 35 0 01 Oct 2023
Breaking through the learning plateaus of in-context learning in Transformer Jingwen Fu Tao Yang Yuwang Wang Yan Lu Nanning Zheng 30 1 0 12 Sep 2023
Towards Vision-Language Mechanistic Interpretability: A Causal Tracing Tool for BLIP Vedant Palit Rohan Pandey Aryaman Arora Paul Pu Liang 34 20 0 27 Aug 2023
Bidirectional Attention as a Mixture of Continuous Word Experts Kevin Christian Wibisono Yixin Wang MoE 13 0 0 08 Jul 2023
Dissecting Recall of Factual Associations in Auto-Regressive Language Models Mor Geva Jasmijn Bastings Katja Filippova Amir Globerson KELM 191 261 0 28 Apr 2023
How Do Transformers Learn Topic Structure: Towards a Mechanistic Understanding Yuchen Li Yuan-Fang Li Andrej Risteski 120 61 0 07 Mar 2023
A Survey on In-context Learning Qingxiu Dong Lei Li Damai Dai Ce Zheng Jingyuan Ma ... Zhiyong Wu Baobao Chang Xu Sun Lei Li Zhifang Sui ReLM AIMat 20 462 0 31 Dec 2022
Interpretability in the Wild: a Circuit for Indirect Object Identification in GPT-2 small Kevin Wang Alexandre Variengien Arthur Conmy Buck Shlegeris Jacob Steinhardt 212 496 0 01 Nov 2022
In-context Learning and Induction Heads Catherine Olsson Nelson Elhage Neel Nanda Nicholas Joseph Nova Dassarma ... Tom B. Brown Jack Clark Jared Kaplan Sam McCandlish C. Olah 250 460 0 24 Sep 2022
Toy Models of Superposition Nelson Elhage Tristan Hume Catherine Olsson Nicholas Schiefer T. Henighan ... Sam McCandlish Jared Kaplan Dario Amodei Martin Wattenberg C. Olah AAML MILM 125 318 0 21 Sep 2022
Efficient Estimation of Word Representations in Vector Space Tomáš Mikolov Kai Chen G. Corrado J. Dean 3DV 245 31,257 0 16 Jan 2013