Bias Similarity Across Large Language Models

Bias in machine learning models, particularly in Large Language Models, is a critical issue as these systems shape important societal decisions. While previous studies have examined bias in individual LLMs, comparisons of bias across models remain underexplored. To address this gap, we analyze 13 LLMs from five families, evaluating bias through output distribution across multiple dimensions using two datasets (4K and 1M questions). Our results show that fine-tuning has minimal impact on output distributions, and proprietary models tend to overly response as unknowns to minimize bias, compromising accuracy and utility. In addition, open-source models like Llama3-Chat and Gemma2-it demonstrate fairness comparable to proprietary models like GPT-4, challenging the assumption that larger, closed-source models are inherently less biased. We also find that bias scores for disambiguated questions are more extreme, raising concerns about reverse discrimination. These findings highlight the need for improved bias mitigation strategies and more comprehensive evaluation metrics for fairness in LLMs.
View on arXiv@article{jeong2025_2410.12010, title={ Bias Similarity Across Large Language Models }, author={ Hyejun Jeong and Shiqing Ma and Amir Houmansadr }, journal={arXiv preprint arXiv:2410.12010}, year={ 2025 } }