PolyPythias: Stability and Outliers across Fifty Language Model Pre-Training RunsInternational Conference on Learning Representations (ICLR), 2025 |
Between Circuits and Chomsky: Pre-pretraining on Formal Languages Imparts Linguistic BiasesAnnual Meeting of the Association for Computational Linguistics (ACL), 2025 |
A distributional simplicity bias in the learning dynamics of transformersNeural Information Processing Systems (NeurIPS), 2024 |
Tending Towards Stability: Convergence Challenges in Small Language
ModelsConference on Empirical Methods in Natural Language Processing (EMNLP), 2024 |
Differentiation and Specialization of Attention Heads via the Refined
Local Learning CoefficientInternational Conference on Learning Representations (ICLR), 2024 |