31

Tracing Multilingual Representations in LLMs with Cross-Layer Transcoders

Main:11 Pages
48 Figures
Bibliography:2 Pages
5 Tables
Appendix:29 Pages
Abstract

Multilingual Large Language Models (LLMs) can process many languages, yet how they internally represent this diversity remains unclear. Do they form shared multilingual representations with language-specific decoding, and if so, why does performance favor the dominant training language? To address this, we train models on different multilingual mixtures and analyze their internal mechanisms using Cross-Layer Transcoders (CLTs) and Attribution Graphs. Our results reveal multilingual shared representations: the model employs highly similar features across languages, while language-specific decoding emerges in later layers.

View on arXiv
Comments on this paper