Language models scale reliably with over-training and on downstream
tasksInternational Conference on Learning Representations (ICLR), 2024 |
Small-scale proxies for large-scale Transformer training instabilitiesInternational Conference on Learning Representations (ICLR), 2023 |
Language Models Understand Us, PoorlyConference on Empirical Methods in Natural Language Processing (EMNLP), 2022 |
A Logic for Expressing Log-Precision TransformersNeural Information Processing Systems (NeurIPS), 2022 |
The Parallelism Tradeoff: Limitations of Log-Precision TransformersTransactions of the Association for Computational Linguistics (TACL), 2022 |
Overcoming a Theoretical Limitation of Self-AttentionAnnual Meeting of the Association for Computational Linguistics (ACL), 2022 |
Saturated Transformers are Constant-Depth Threshold CircuitsTransactions of the Association for Computational Linguistics (TACL), 2021 |