ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2509.13930
288
2
v1v2 (latest)

Linguistic Nepotism: Trading-off Quality for Language Preference in Multilingual RAG

17 September 2025
Dayeon Ki
Marine Carpuat
Paul McNamee
Daniel Khashabi
Eugene Yang
Dawn J Lawrie
Kevin Duh
ArXiv (abs)PDFHTMLGithub
Main:14 Pages
22 Figures
Bibliography:3 Pages
14 Tables
Appendix:16 Pages
Abstract

Multilingual Retrieval-Augmented Generation (mRAG) systems enable language models to answer knowledge-intensive queries with citation-supported responses across languages. While such systems have been proposed, an open questions is whether the mixture of different document languages impacts generation and citation in unintended ways. To investigate, we introduce a controlled methodology using model internals to measure language preference while holding other factors such as document relevance constant. Across eight languages and six open-weight models, we find that models preferentially cite English sources when queries are in English, with this bias amplified for lower-resource languages and for documents positioned mid-context. Crucially, we find that models sometimes trade-off document relevance for language preference, indicating that citation choices are not always driven by informativeness alone. Our findings shed light on how language models leverage multilingual context and influence citation behavior.

View on arXiv
Comments on this paper