Mechanics of Next Token Prediction with Self-Attention

12 March 2024

Papers citing "Mechanics of Next Token Prediction with Self-Attention"

7 / 7 papers shown

Title
How Transformers Learn Regular Language Recognition: A Theoretical Study on Training Dynamics and Implicit Bias Ruiquan Huang Yingbin Liang Jing Yang 46 0 0 02 May 2025
Reasoning Bias of Next Token Prediction Training Pengxiao Lin Zhongwang Zhang Zhi-Qin John Xu LRM 80 1 0 21 Feb 2025
On the Learn-to-Optimize Capabilities of Transformers in In-Context Sparse Recovery Renpu Liu Ruida Zhou Cong Shen Jing Yang 23 0 0 17 Oct 2024
Implicit Bias and Fast Convergence Rates for Self-attention Bhavya Vasudeva Puneesh Deora Christos Thrampoulidis 24 13 0 08 Feb 2024
Provably learning a multi-head attention layer Sitan Chen Yuanzhi Li MLT 26 14 0 06 Feb 2024
Towards Understanding the Word Sensitivity of Attention Layers: A Study via Random Features Simone Bombari Marco Mondelli 26 3 0 05 Feb 2024
How Do Transformers Learn Topic Structure: Towards a Mechanistic Understanding Yuchen Li Yuan-Fang Li Andrej Risteski 107 61 0 07 Mar 2023