Towards a theory of how the structure of language is acquired by deep neural networks

28 May 2024

Papers citing "Towards a theory of how the structure of language is acquired by deep neural networks"

5 / 5 papers shown

Title
A distributional simplicity bias in the learning dynamics of transformers Riccardo Rende Federica Gerace A. Laio Sebastian Goldt 68 7 0 17 Feb 2025
The Benefits of Reusing Batches for Gradient Descent in Two-Layer Networks: Breaking the Curse of Information and Leap Exponents Yatin Dandi Emanuele Troiani Luca Arnaboldi Luca Pesce Lenka Zdeborová Florent Krzakala MLT 53 24 0 05 Feb 2024
A Dynamical Model of Neural Scaling Laws Blake Bordelon Alexander B. Atanasov C. Pehlevan 41 36 0 02 Feb 2024
Do Transformers Parse while Predicting the Masked Word? Haoyu Zhao A. Panigrahi Rong Ge Sanjeev Arora 74 29 0 14 Mar 2023
Scaling Laws for Neural Language Models Jared Kaplan Sam McCandlish T. Henighan Tom B. Brown B. Chess R. Child Scott Gray Alec Radford Jeff Wu Dario Amodei 220 3,054 0 23 Jan 2020