Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2306.00802
Cited By
Birth of a Transformer: A Memory Viewpoint
1 June 2023
A. Bietti
Vivien A. Cabannes
Diane Bouchacourt
Hervé Jégou
Léon Bottou
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Birth of a Transformer: A Memory Viewpoint"
50 / 67 papers shown
Title
Understanding In-context Learning of Addition via Activation Subspaces
Xinyan Hu
Kayo Yin
Michael I. Jordan
Jacob Steinhardt
Lijie Chen
53
0
0
08 May 2025
Understanding the Skill Gap in Recurrent Language Models: The Role of the Gather-and-Aggregate Mechanism
Aviv Bick
Eric P. Xing
Albert Gu
RALM
88
0
0
22 Apr 2025
It's All Connected: A Journey Through Test-Time Memorization, Attentional Bias, Retention, and Online Optimization
Ali Behrouz
Meisam Razaviyayn
Peilin Zhong
Vahab Mirrokni
38
0
0
17 Apr 2025
Approximation Bounds for Transformer Networks with Application to Regression
Yuling Jiao
Yanming Lai
Defeng Sun
Yang Wang
Bokai Yan
29
0
0
16 Apr 2025
Taming Knowledge Conflicts in Language Models
Gaotang Li
Yuzhong Chen
Hanghang Tong
KELM
46
1
0
14 Mar 2025
Real-Time Personalization with Simple Transformers
Lin An
Andrew A. Li
Vaisnavi Nemala
Gabriel Visotsky
31
0
0
01 Mar 2025
Hyperspherical Energy Transformer with Recurrent Depth
Yunzhe Hu
Difan Zou
Dong Xu
46
0
0
17 Feb 2025
Are Transformers Able to Reason by Connecting Separated Knowledge in Training Data?
Yutong Yin
Zhaoran Wang
LRM
ReLM
143
0
0
27 Jan 2025
Concept-ROT: Poisoning Concepts in Large Language Models with Model Editing
Keltin Grimes
Marco Christiani
David Shriver
Marissa Connor
KELM
80
1
0
17 Dec 2024
Understanding Knowledge Hijack Mechanism in In-context Learning through Associative Memory
Shuo Wang
Issei Sato
76
0
0
16 Dec 2024
An In-depth Investigation of Sparse Rate Reduction in Transformer-like Models
Yunzhe Hu
Difan Zou
Dong Xu
74
1
0
26 Nov 2024
Leveraging Large Language Models for Enhancing Public Transit Services
Jiahao Wang
Amer Shalaby
29
0
0
18 Oct 2024
Active-Dormant Attention Heads: Mechanistically Demystifying Extreme-Token Phenomena in LLMs
Tianyu Guo
Druv Pai
Yu Bai
Jiantao Jiao
Michael I. Jordan
Song Mei
29
9
0
17 Oct 2024
Bypassing the Exponential Dependency: Looped Transformers Efficiently Learn In-context by Multi-step Gradient Descent
Bo Chen
Xiaoyu Li
Yingyu Liang
Zhenmei Shi
Zhao-quan Song
96
19
0
15 Oct 2024
Learning Linear Attention in Polynomial Time
Morris Yau
Ekin Akyürek
Jiayuan Mao
Joshua B. Tenenbaum
Stefanie Jegelka
Jacob Andreas
19
2
0
14 Oct 2024
Zero-Shot Generalization of Vision-Based RL Without Data Augmentation
S. Batra
Gaurav Sukhatme
OffRL
DRL
28
1
0
09 Oct 2024
Transformers learn variable-order Markov chains in-context
Ruida Zhou
C. Tian
Suhas Diggavi
26
0
0
07 Oct 2024
DAPE V2: Process Attention Score as Feature Map for Length Extrapolation
Chuanyang Zheng
Yihang Gao
Han Shi
Jing Xiong
Jiankai Sun
...
Xiaozhe Ren
Michael Ng
Xin Jiang
Zhenguo Li
Yu Li
31
2
0
07 Oct 2024
Density estimation with LLMs: a geometric investigation of in-context learning trajectories
Toni J. B. Liu
Nicolas Boullé
Raphael Sarfati
Christopher Earls
28
0
0
07 Oct 2024
Large Language Models as Markov Chains
Oussama Zekri
Ambroise Odonnat
Abdelhakim Benechehab
Linus Bleistein
Nicolas Boullé
I. Redko
40
10
0
03 Oct 2024
Attention layers provably solve single-location regression
P. Marion
Raphael Berthier
Gérard Biau
Claire Boyer
140
2
0
02 Oct 2024
Attention Heads of Large Language Models: A Survey
Zifan Zheng
Yezhaohui Wang
Yuxin Huang
Shichao Song
Mingchuan Yang
Bo Tang
Feiyu Xiong
Zhiyu Li
LRM
58
21
0
05 Sep 2024
One-layer transformers fail to solve the induction heads task
Clayton Sanford
Daniel J. Hsu
Matus Telgarsky
35
8
0
26 Aug 2024
Spin glass model of in-context learning
Yuhao Li
Ruoran Bai
Haiping Huang
LRM
42
0
0
05 Aug 2024
MCGMark: An Encodable and Robust Online Watermark for Tracing LLM-Generated Malicious Code
Peng Ding
Jingyu Wu
Qingyuan Zhong
Dan Ma
Xunliang Cai
...
Shi Chen
Weizhe Zhang
Zibin Zheng
Weizhe Zhang
Zibin Zheng
48
0
0
02 Aug 2024
Transformers on Markov Data: Constant Depth Suffices
Nived Rajaraman
Marco Bondaschi
Kannan Ramchandran
Michael C. Gastpar
Ashok Vardhan Makkuva
45
4
0
25 Jul 2024
Empirical Capacity Model for Self-Attention Neural Networks
Aki Härmä
M. Pietrasik
Anna Wilbik
34
1
0
22 Jul 2024
A Practical Review of Mechanistic Interpretability for Transformer-Based Language Models
Daking Rai
Yilun Zhou
Shi Feng
Abulhair Saparov
Ziyu Yao
82
19
0
02 Jul 2024
Do LLMs dream of elephants (when told not to)? Latent concept association and associative memory in transformers
Yibo Jiang
Goutham Rajendran
Pradeep Ravikumar
Bryon Aragam
CLL
KELM
34
6
0
26 Jun 2024
Exact Conversion of In-Context Learning to Model Weights in Linearized-Attention Transformers
Brian K Chen
Tianyang Hu
Hui Jin
Hwee Kuan Lee
Kenji Kawaguchi
50
0
0
05 Jun 2024
Contextual Counting: A Mechanistic Study of Transformers on a Quantitative Task
Siavash Golkar
Alberto Bietti
Mariel Pettee
Michael Eickenberg
M. Cranmer
...
Ruben Ohana
Liam Parker
Bruno Régaldo-Saint Blancard
Kyunghyun Cho
Shirley Ho
47
1
0
30 May 2024
TAIA: Large Language Models are Out-of-Distribution Data Learners
Shuyang Jiang
Yusheng Liao
Ya-Qin Zhang
Yu Wang
Yanfeng Wang
29
3
0
30 May 2024
Why Larger Language Models Do In-context Learning Differently?
Zhenmei Shi
Junyi Wei
Zhuoyan Xu
Yingyu Liang
37
18
0
30 May 2024
CHANI: Correlation-based Hawkes Aggregation of Neurons with bio-Inspiration
Sophie Jaffard
Samuel Vaiter
Patricia Reynaud-Bouret
71
0
0
29 May 2024
IM-Context: In-Context Learning for Imbalanced Regression Tasks
Ismail Nejjar
Faez Ahmed
Olga Fink
27
1
0
28 May 2024
Asymptotic theory of in-context learning by linear attention
Yue M. Lu
Mary I. Letey
Jacob A. Zavatone-Veth
Anindita Maiti
C. Pehlevan
26
10
0
20 May 2024
Memory Mosaics
Jianyu Zhang
Niklas Nolte
Ranajoy Sadhukhan
Beidi Chen
Léon Bottou
VLM
70
3
0
10 May 2024
Towards a Theoretical Understanding of the 'Reversal Curse' via Training Dynamics
Hanlin Zhu
Baihe Huang
Shaolun Zhang
Michael I. Jordan
Jiantao Jiao
Yuandong Tian
Stuart Russell
LRM
AI4CE
49
13
0
07 May 2024
BiSHop: Bi-Directional Cellular Learning for Tabular Data with Generalized Sparse Modern Hopfield Model
Chenwei Xu
Yu-Chao Huang
Jerry Yao-Chieh Hu
Weijian Li
Ammar Gilani
H. Goan
Han Liu
52
19
0
04 Apr 2024
Outlier-Efficient Hopfield Layers for Large Transformer-Based Models
Jerry Yao-Chieh Hu
Pei-Hsuan Chang
Haozheng Luo
Hong-Yu Chen
Weijian Li
Wei-Po Wang
Han Liu
39
26
0
04 Apr 2024
The Garden of Forking Paths: Observing Dynamic Parameters Distribution in Large Language Models
Carlo Nicolini
Jacopo Staiano
Bruno Lepri
Raffaele Marino
MoE
26
1
0
13 Mar 2024
Heavy-Tailed Class Imbalance and Why Adam Outperforms Gradient Descent on Language Models
Frederik Kunstner
Robin Yadav
Alan Milligan
Mark Schmidt
Alberto Bietti
39
26
0
29 Feb 2024
Learning Associative Memories with Gradient Descent
Vivien A. Cabannes
Berfin Simsek
A. Bietti
38
6
0
28 Feb 2024
Prospector Heads: Generalized Feature Attribution for Large Models & Data
Gautam Machiraju
Alexander Derry
Arjun D Desai
Neel Guha
Amir-Hossein Karimi
James Zou
Russ Altman
Christopher Ré
Parag Mallick
AI4TS
MedIm
48
0
0
18 Feb 2024
The Evolution of Statistical Induction Heads: In-Context Learning Markov Chains
Benjamin L. Edelman
Ezra Edelman
Surbhi Goel
Eran Malach
Nikolaos Tsilivis
BDL
26
42
0
16 Feb 2024
Understanding In-Context Learning with a Pelican Soup Framework
Ting-Rui Chiang
Dani Yogatama
16
1
0
16 Feb 2024
Secret Collusion among Generative AI Agents: Multi-Agent Deception via Steganography
S. Motwani
Mikhail Baranchuk
Martin Strohmeier
Vijay Bolina
Philip H. S. Torr
Lewis Hammond
Christian Schroeder de Witt
40
4
0
12 Feb 2024
Implicit Bias and Fast Convergence Rates for Self-attention
Bhavya Vasudeva
Puneesh Deora
Christos Thrampoulidis
34
13
0
08 Feb 2024
Attention with Markov: A Framework for Principled Analysis of Transformers via Markov Chains
Ashok Vardhan Makkuva
Marco Bondaschi
Adway Girish
Alliot Nagle
Martin Jaggi
Hyeji Kim
Michael C. Gastpar
OffRL
18
25
0
06 Feb 2024
A phase transition between positional and semantic learning in a solvable model of dot-product attention
Hugo Cui
Freya Behrens
Florent Krzakala
Lenka Zdeborová
MLT
30
11
0
06 Feb 2024
1
2
Next