A distributional simplicity bias in the learning dynamics of transformersNeural Information Processing Systems (NeurIPS), 2024 |
A Random Matrix Theory Perspective on the Spectrum of Learned Features
and Asymptotic Generalization CapabilitiesInternational Conference on Artificial Intelligence and Statistics (AISTATS), 2024 |
How Feature Learning Can Improve Neural Scaling LawsInternational Conference on Learning Representations (ICLR), 2024 |
Learning time-scales in two-layers neural networksFoundations of Computational Mathematics (FoCM), 2023 |