ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2505.03053
25
0

Developing A Framework to Support Human Evaluation of Bias in Generated Free Response Text

5 May 2025
Jennifer Healey
Laurie Byrum
Md Nadeem Akhtar
Surabhi Bhargava
Moumita Sinha
ArXivPDFHTML
Abstract

LLM evaluation is challenging even the case of base models. In real world deployments, evaluation is further complicated by the interplay of task specific prompts and experiential context. At scale, bias evaluation is often based on short context, fixed choice benchmarks that can be rapidly evaluated, however, these can lose validity when the LLMs' deployed context differs. Large scale human evaluation is often seen as too intractable and costly. Here we present our journey towards developing a semi-automated bias evaluation framework for free text responses that has human insights at its core. We discuss how we developed an operational definition of bias that helped us automate our pipeline and a methodology for classifying bias beyond multiple choice. We additionally comment on how human evaluation helped us uncover problematic templates in a bias benchmark.

View on arXiv
@article{healey2025_2505.03053,
  title={ Developing A Framework to Support Human Evaluation of Bias in Generated Free Response Text },
  author={ Jennifer Healey and Laurie Byrum and Md Nadeem Akhtar and Surabhi Bhargava and Moumita Sinha },
  journal={arXiv preprint arXiv:2505.03053},
  year={ 2025 }
}
Comments on this paper