ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2505.01198
2
0

Gender Bias in Explainability: Investigating Performance Disparity in Post-hoc Methods

2 May 2025
Mahdi Dhaini
Ege Erdogan
Nils Feldhus
Gjergji Kasneci
ArXivPDFHTML
Abstract

While research on applications and evaluations of explanation methods continues to expand, fairness of the explanation methods concerning disparities in their performance across subgroups remains an often overlooked aspect. In this paper, we address this gap by showing that, across three tasks and five language models, widely used post-hoc feature attribution methods exhibit significant gender disparity with respect to their faithfulness, robustness, and complexity. These disparities persist even when the models are pre-trained or fine-tuned on particularly unbiased datasets, indicating that the disparities we observe are not merely consequences of biased training data. Our results highlight the importance of addressing disparities in explanations when developing and applying explainability methods, as these can lead to biased outcomes against certain subgroups, with particularly critical implications in high-stakes contexts. Furthermore, our findings underscore the importance of incorporating the fairness of explanations, alongside overall model fairness and explainability, as a requirement in regulatory frameworks.

View on arXiv
@article{dhaini2025_2505.01198,
  title={ Gender Bias in Explainability: Investigating Performance Disparity in Post-hoc Methods },
  author={ Mahdi Dhaini and Ege Erdogan and Nils Feldhus and Gjergji Kasneci },
  journal={arXiv preprint arXiv:2505.01198},
  year={ 2025 }
}
Comments on this paper