ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2403.08081
  4. Cited By
Mechanics of Next Token Prediction with Self-Attention

Mechanics of Next Token Prediction with Self-Attention

International Conference on Artificial Intelligence and Statistics (AISTATS), 2024
12 March 2024
Yingcong Li
Yixiao Huang
M. E. Ildiz
A. S. Rawat
Samet Oymak
ArXiv (abs)PDFHTMLGithub

Papers citing "Mechanics of Next Token Prediction with Self-Attention"

16 / 16 papers shown
Towards Understanding Transformers in Learning Random Walks
Towards Understanding Transformers in Learning Random Walks
Wei Shi
Yuan Cao
125
1
0
28 Nov 2025
Fighter: Unveiling the Graph Convolutional Nature of Transformers in Time Series Modeling
Fighter: Unveiling the Graph Convolutional Nature of Transformers in Time Series Modeling
Chen Zhang
Weixin Bu
Wendong Xu
Runsheng Yu
Yik-Chung Wu
Ngai Wong
AI4TSBDL
437
0
0
20 Oct 2025
Facts in Stats: Impacts of Pretraining Diversity on Language Model Generalization
Facts in Stats: Impacts of Pretraining Diversity on Language Model Generalization
Tina Behnia
Puneesh Deora
Christos Thrampoulidis
150
0
0
17 Oct 2025
Decoupling Positional and Symbolic Attention Behavior in Transformers
Decoupling Positional and Symbolic Attention Behavior in Transformers
Felipe Urrutia
Jorge Salas
Alexander Kozachinskiy
Cristian B. Calderon
Hector Pasten
Cristóbal Rojas
135
1
0
03 Oct 2025
Emergence of Superposition: Unveiling the Training Dynamics of Chain of Continuous Thought
Emergence of Superposition: Unveiling the Training Dynamics of Chain of Continuous Thought
Hanlin Zhu
Shibo Hao
Zhiting Hu
Jiantao Jiao
Stuart Russell
Yuandong Tian
LRM
238
9
0
27 Sep 2025
Sequential keypoint density estimator: an overlooked baseline of skeleton-based video anomaly detection
Sequential keypoint density estimator: an overlooked baseline of skeleton-based video anomaly detection
Anja Delić
Matej Grcić
Sinisa Segvic
247
3
0
23 Jun 2025
Generalization or Hallucination? Understanding Out-of-Context Reasoning in Transformers
Generalization or Hallucination? Understanding Out-of-Context Reasoning in Transformers
Yixiao Huang
Hanlin Zhu
Tianyu Guo
Jiantao Jiao
Somayeh Sojoudi
Michael I. Jordan
Stuart Russell
Song Mei
LRM
742
7
0
12 Jun 2025
FloorPlan-DeepSeek (FPDS): A multimodal approach to floorplan generation using vector-based next room prediction
FloorPlan-DeepSeek (FPDS): A multimodal approach to floorplan generation using vector-based next room prediction
Jun Yin
Pengyu Zeng
Jing Zhong
Peilin Li
Miao Zhang
Ran Luo
Shuai Lu
3DV
220
7
0
12 Jun 2025
How Transformers Learn In-Context Recall Tasks? Optimality, Training Dynamics and Generalization
How Transformers Learn In-Context Recall Tasks? Optimality, Training Dynamics and Generalization
Quan Nguyen
Thanh Nguyen-Tang
MLT
572
1
0
21 May 2025
How Transformers Learn Regular Language Recognition: A Theoretical Study on Training Dynamics and Implicit Bias
How Transformers Learn Regular Language Recognition: A Theoretical Study on Training Dynamics and Implicit Bias
Ruiquan Huang
Yingbin Liang
Jing Yang
762
5
0
02 May 2025
Reasoning Bias of Next Token Prediction Training
Reasoning Bias of Next Token Prediction Training
Pengxiao Lin
Zhongwang Zhang
Zhi-Qin John Xu
LRM
532
3
0
21 Feb 2025
Rethinking Associative Memory Mechanism in Induction Head
Rethinking Associative Memory Mechanism in Induction Head
Shuo Wang
Issei Sato
561
0
0
16 Dec 2024
On the Learn-to-Optimize Capabilities of Transformers in In-Context Sparse Recovery
On the Learn-to-Optimize Capabilities of Transformers in In-Context Sparse RecoveryInternational Conference on Learning Representations (ICLR), 2024
Renpu Liu
Ruida Zhou
Cong Shen
Jing Yang
584
4
0
17 Oct 2024
Training Nonlinear Transformers for Chain-of-Thought Inference: A Theoretical Generalization Analysis
Training Nonlinear Transformers for Chain-of-Thought Inference: A Theoretical Generalization AnalysisInternational Conference on Learning Representations (ICLR), 2024
Hongkang Li
Songtao Lu
Pin-Yu Chen
Xiaodong Cui
Meng Wang
LRM
605
13
0
03 Oct 2024
The pitfalls of next-token prediction
The pitfalls of next-token predictionInternational Conference on Machine Learning (ICML), 2024
Gregor Bachmann
Vaishnavh Nagarajan
600
157
0
11 Mar 2024
Implicit Bias and Fast Convergence Rates for Self-attention
Implicit Bias and Fast Convergence Rates for Self-attention
Bhavya Vasudeva
Puneesh Deora
Christos Thrampoulidis
531
31
0
08 Feb 2024
1
Page 1 of 1