ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2505.12054
81
0

GenderBench: Evaluation Suite for Gender Biases in LLMs

17 May 2025
Matúš Pikuliak
ArXiv (abs)PDFHTML
Main:8 Pages
3 Figures
Bibliography:3 Pages
4 Tables
Appendix:2 Pages
Abstract

We present GenderBench -- a comprehensive evaluation suite designed to measure gender biases in LLMs. GenderBench includes 14 probes that quantify 19 gender-related harmful behaviors exhibited by LLMs. We release GenderBench as an open-source and extensible library to improve the reproducibility and robustness of benchmarking across the field. We also publish our evaluation of 12 LLMs. Our measurements reveal consistent patterns in their behavior. We show that LLMs struggle with stereotypical reasoning, equitable gender representation in generated texts, and occasionally also with discriminatory behavior in high-stakes scenarios, such as hiring.

View on arXiv
@article{pikuliak2025_2505.12054,
  title={ GenderBench: Evaluation Suite for Gender Biases in LLMs },
  author={ Matúš Pikuliak },
  journal={arXiv preprint arXiv:2505.12054},
  year={ 2025 }
}
Comments on this paper