
Title |
|---|
![]() When is Task Vector Provably Effective for Model Editing? A Generalization Analysis of Nonlinear TransformersInternational Conference on Learning Representations (ICLR), 2025 |
![]() A distributional simplicity bias in the learning dynamics of transformersNeural Information Processing Systems (NeurIPS), 2024 |
![]() DiTASK: Multi-Task Fine-Tuning with Diffeomorphic TransformationsComputer Vision and Pattern Recognition (CVPR), 2025 |
![]() Geometric Signatures of Compositionality Across a Language Model's LifetimeAnnual Meeting of the Association for Computational Linguistics (ACL), 2024 |
![]() A phase transition between positional and semantic learning in a
solvable model of dot-product attentionNeural Information Processing Systems (NeurIPS), 2024 |
![]() JoMA: Demystifying Multilayer Transformers via JOint Dynamics of MLP and
AttentionInternational Conference on Learning Representations (ICLR), 2023 |
![]() Saddle-to-Saddle Dynamics in Diagonal Linear NetworksNeural Information Processing Systems (NeurIPS), 2023 |