21
0

The Dual-Route Model of Induction

Abstract

Prior work on in-context copying has shown the existence of induction heads, which attend to and promote individual tokens during copying. In this work we introduce a new type of induction head: concept-level induction heads, which copy entire lexical units instead of individual tokens. Concept induction heads learn to attend to the ends of multi-token words throughout training, working in parallel with token-level induction heads to copy meaningful text. We show that these heads are responsible for semantic tasks like word-level translation, whereas token induction heads are vital for tasks that can only be done verbatim, like copying nonsense tokens. These two "routes" operate independently: in fact, we show that ablation of token induction heads causes models to paraphrase where they would otherwise copy verbatim. In light of these findings, we argue that although token induction heads are vital for specific tasks, concept induction heads may be more broadly relevant for in-context learning.

View on arXiv
@article{feucht2025_2504.03022,
  title={ The Dual-Route Model of Induction },
  author={ Sheridan Feucht and Eric Todd and Byron Wallace and David Bau },
  journal={arXiv preprint arXiv:2504.03022},
  year={ 2025 }
}
Comments on this paper