Bias Beyond English: Evaluating Social Bias and Debiasing Methods in a Low-Resource Setting

15 April 2025

Abstract

Social bias in language models can potentially exacerbate social inequalities. Despite it having garnered wide attention, most research focuses on English data. In a low-resource scenario, the models often perform worse due to insufficient training data. This study aims to leverage high-resource language corpora to evaluate bias and experiment with debiasing methods in low-resource languages. We evaluated the performance of recent multilingual models in five languages: English (\textsc{eng}), Chinese (\textsc{zho}), Russian (\textsc{rus}), Indonesian (\textsc{ind}) and Thai (\textsc{tha}), and analyzed four bias dimensions: \textit{gender}, \textit{religion}, \textit{nationality}, and \textit{race-color}. By constructing multilingual bias evaluation datasets, this study allows fair comparisons between models across languages. We have further investigated three debiasing methods-\texttt{CDA}, \texttt{Dropout}, \texttt{SenDeb}-and demonstrated that debiasing methods from high-resource languages can be effectively transferred to low-resource ones, providing actionable insights for fairness research in multilingual NLP.

View on arXiv

@article{zhou2025_2504.11183,
  title={ Bias Beyond English: Evaluating Social Bias and Debiasing Methods in a Low-Resource Setting },
  author={ Ej Zhou and Weiming Lu },
  journal={arXiv preprint arXiv:2504.11183},
  year={ 2025 }
}

Comments on this paper