Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2306.07042
Cited By
Transformers learn through gradual rank increase
12 June 2023
Enric Boix-Adserà
Etai Littwin
Emmanuel Abbe
Samy Bengio
J. Susskind
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Transformers learn through gradual rank increase"
12 / 12 papers shown
Title
How Transformers Learn Regular Language Recognition: A Theoretical Study on Training Dynamics and Implicit Bias
Ruiquan Huang
Yingbin Liang
Jing Yang
46
0
0
02 May 2025
When is Task Vector Provably Effective for Model Editing? A Generalization Analysis of Nonlinear Transformers
Hongkang Li
Yihua Zhang
Shuai Zhang
M. Wang
Sijia Liu
Pin-Yu Chen
MoMe
60
2
0
15 Apr 2025
A distributional simplicity bias in the learning dynamics of transformers
Riccardo Rende
Federica Gerace
A. Laio
Sebastian Goldt
68
8
0
17 Feb 2025
DiTASK: Multi-Task Fine-Tuning with Diffeomorphic Transformations
Krishna Sri Ipsit Mantri
Carola-Bibiane Schönlieb
Bruno Ribeiro
Chaim Baskin
Moshe Eliasof
41
0
0
09 Feb 2025
Geometric Signatures of Compositionality Across a Language Model's Lifetime
Jin Hwa Lee
Thomas Jiralerspong
Lei Yu
Yoshua Bengio
Emily Cheng
CoGe
82
0
0
02 Oct 2024
Reasoning in Large Language Models: A Geometric Perspective
Romain Cosentino
Sarath Shekkizhar
LRM
42
2
0
02 Jul 2024
Dissecting the Interplay of Attention Paths in a Statistical Mechanics Theory of Transformers
Lorenzo Tiberi
Francesca Mignacco
Kazuki Irie
H. Sompolinsky
42
6
0
24 May 2024
Saddle-to-Saddle Dynamics in Diagonal Linear Networks
Scott Pesme
Nicolas Flammarion
17
35
0
02 Apr 2023
SGD learning on neural networks: leap complexity and saddle-to-saddle dynamics
Emmanuel Abbe
Enric Boix-Adserà
Theodor Misiakiewicz
FedML
MLT
76
72
0
21 Feb 2023
Learning Single-Index Models with Shallow Neural Networks
A. Bietti
Joan Bruna
Clayton Sanford
M. Song
160
67
0
27 Oct 2022
Neural Networks Efficiently Learn Low-Dimensional Representations with SGD
Alireza Mousavi-Hosseini
Sejun Park
M. Girotti
Ioannis Mitliagkas
Murat A. Erdogdu
MLT
319
48
0
29 Sep 2022
Implicit Regularization in Deep Tensor Factorization
P. Milanesi
Hachem Kadri
Stéphane Ayache
Thierry Artières
40
9
0
04 May 2021
1