48
0

Assessing Large Language Models in Agentic Multilingual National Bias

Abstract

Large Language Models have garnered significant attention for their capabilities in multilingual natural language processing, while studies on risks associated with cross biases are limited to immediate context preferences. Cross-language disparities in reasoning-based recommendations remain largely unexplored, with a lack of even descriptive analysis. This study is the first to address this gap. We test LLM's applicability and capability in providing personalized advice across three key scenarios: university applications, travel, and relocation. We investigate multilingual bias in state-of-the-art LLMs by analyzing their responses to decision-making tasks across multiple languages. We quantify bias in model-generated scores and assess the impact of demographic factors and reasoning strategies (e.g., Chain-of-Thought prompting) on bias patterns. Our findings reveal that local language bias is prevalent across different tasks, with GPT-4 and Sonnet reducing bias for English-speaking countries compared to GPT-3.5 but failing to achieve robust multilingual alignment, highlighting broader implications for multilingual AI agents and applications such as education.

View on arXiv
@article{liu2025_2502.17945,
  title={ Assessing Large Language Models in Agentic Multilingual National Bias },
  author={ Qianying Liu and Katrina Qiyao Wang and Fei Cheng and Sadao Kurohashi },
  journal={arXiv preprint arXiv:2502.17945},
  year={ 2025 }
}
Comments on this paper