Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2305.16380
Cited By
Scan and Snap: Understanding Training Dynamics and Token Composition in 1-layer Transformer
25 May 2023
Yuandong Tian
Yiping Wang
Beidi Chen
S. Du
MLT
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Scan and Snap: Understanding Training Dynamics and Token Composition in 1-layer Transformer"
15 / 15 papers shown
Title
How Transformers Learn Regular Language Recognition: A Theoretical Study on Training Dynamics and Implicit Bias
Ruiquan Huang
Yingbin Liang
Jing Yang
43
0
0
02 May 2025
On the Learn-to-Optimize Capabilities of Transformers in In-Context Sparse Recovery
Renpu Liu
Ruida Zhou
Cong Shen
Jing Yang
23
0
0
17 Oct 2024
From Sparse Dependence to Sparse Attention: Unveiling How Chain-of-Thought Enhances Transformer Sample Efficiency
Kaiyue Wen
Huaqing Zhang
Hongzhou Lin
Jingzhao Zhang
MoE
LRM
54
2
0
07 Oct 2024
Theoretical Insights into Fine-Tuning Attention Mechanism: Generalization and Optimization
Xinhao Yao
Hongjin Qian
Xiaolin Hu
Gengze Xu
Wei Liu
Jian Luan
B. Wang
Y. Liu
41
0
0
03 Oct 2024
Attention layers provably solve single-location regression
P. Marion
Raphael Berthier
Gérard Biau
Claire Boyer
42
2
0
02 Oct 2024
Dissecting the Interplay of Attention Paths in a Statistical Mechanics Theory of Transformers
Lorenzo Tiberi
Francesca Mignacco
Kazuki Irie
H. Sompolinsky
36
5
0
24 May 2024
Linear Transformers are Versatile In-Context Learners
Max Vladymyrov
J. Oswald
Mark Sandler
Rong Ge
18
13
0
21 Feb 2024
Implicit Bias and Fast Convergence Rates for Self-attention
Bhavya Vasudeva
Puneesh Deora
Christos Thrampoulidis
24
13
0
08 Feb 2024
An Information-Theoretic Analysis of In-Context Learning
Hong Jun Jeon
Jason D. Lee
Qi Lei
Benjamin Van Roy
13
18
0
28 Jan 2024
Do Transformers Parse while Predicting the Masked Word?
Haoyu Zhao
A. Panigrahi
Rong Ge
Sanjeev Arora
74
29
0
14 Mar 2023
How Do Transformers Learn Topic Structure: Towards a Mechanistic Understanding
Yuchen Li
Yuan-Fang Li
Andrej Risteski
98
61
0
07 Mar 2023
In-context Learning and Induction Heads
Catherine Olsson
Nelson Elhage
Neel Nanda
Nicholas Joseph
Nova Dassarma
...
Tom B. Brown
Jack Clark
Jared Kaplan
Sam McCandlish
C. Olah
240
453
0
24 Sep 2022
Understanding Deep Contrastive Learning via Coordinate-wise Optimization
Yuandong Tian
43
34
0
29 Jan 2022
Masked Autoencoders Are Scalable Vision Learners
Kaiming He
Xinlei Chen
Saining Xie
Yanghao Li
Piotr Dollár
Ross B. Girshick
ViT
TPM
258
7,337
0
11 Nov 2021
Train Short, Test Long: Attention with Linear Biases Enables Input Length Extrapolation
Ofir Press
Noah A. Smith
M. Lewis
234
690
0
27 Aug 2021
1