Understanding LLMs' Cross-Lingual Context Retrieval: How Good It Is And Where It Comes From

The ability of cross-lingual context retrieval is a fundamental aspect of cross-lingual alignment of large language models (LLMs), where the model extracts context information in one language based on requests in another language. Despite its importance in real-life applications, this ability has not been adequately investigated for state-of-the-art models. In this paper, we evaluate the cross-lingual context retrieval ability of over 40 LLMs across 12 languages to understand the source of this ability, using cross-lingual machine reading comprehension (xMRC) as a representative scenario. Our results show that several small, post-trained open LLMs show strong cross-lingual context retrieval ability, comparable to closed-source LLMs such as GPT-4o, and their estimated oracle performances greatly improve after post-training. Our interpretability analysis shows that the cross-lingual context retrieval process can be divided into two main phases: question encoding and answer retrieval, which are formed in pre-training and post-training, respectively. The phasing stability correlates with xMRC performance, and the xMRC bottleneck lies at the last model layers in the second phase, where the effect of post-training can be evidently observed. Our results also indicate that larger-scale pretraining cannot improve the xMRC performance. Instead, larger LLMs need further multilingual post-training to fully unlock their cross-lingual context retrieval potential. Our code and is available atthis https URL
View on arXiv@article{gao2025_2504.10906, title={ Understanding LLMs' Cross-Lingual Context Retrieval: How Good It Is And Where It Comes From }, author={ Changjiang Gao and Hankun Lin and Shujian Huang and Xin Huang and Xue Han and Junlan Feng and Chao Deng and Jiajun Chen }, journal={arXiv preprint arXiv:2504.10906}, year={ 2025 } }