Papers citing 'Beyond Performance: Quantifying and Mitigating Label Bias in LLMs'

Title
Person-Centric Annotations of LAION-400M: Auditing Bias and Its Transfer to Models Leander Girrbach Stephan Alaniz Genevieve Smith Trevor Darrell Zeynep Akata 148 1 0 04 Oct 2025
Hearing the Order: Investigating Selection Bias in Large Audio-Language Models Yu-Xiang Lin Chen-An Li Sheng-Lun Wei Po-Chun Chen Hsin-Hsi Chen Hung-yi Lee 100 0 0 01 Oct 2025
Metric assessment protocol in the context of answer fluctuation on MCQ tasks Ekaterina Goliakova X. Renard Marie-Jeanne Lesot Thibault Laugel Christophe Marsala Marcin Detyniecki 83 0 0 21 Jul 2025
PromptSuite: A Task-Agnostic Framework for Multi-Prompt Generation Eliya Habba Noam Dahan Gili Lior Gabriel Stanovsky LRM 270 1 0 20 Jul 2025
SATA-BENCH: Select All That Apply Benchmark for Multiple Choice Questions Weijie Xu Shixian Cui Xi Fang Chi Xue Stephanie Eckman Chandan K. Reddy ELM 239 4 0 31 May 2025
Through the LLM Looking Glass: A Socratic Probing of Donkeys, Elephants, and Markets Molly Kennedy Ayyoob Imani Timo Spinde Hinrich Schütze 228 1 0 20 Mar 2025
Towards AI-assisted Academic Writing Daniel J. Liebling Malcolm Kane Madeleine Grunde-Mclaughlin Ian J. Lang Subhashini Venugopalan Michael P. Brenner 196 3 0 17 Mar 2025
DOVE: A Large-Scale Multi-Dimensional Predictions Dataset Towards Meaningful LLM EvaluationAnnual Meeting of the Association for Computational Linguistics (ACL), 2025 Eliya Habba Ofir Arviv Itay Itzhak Yotam Perlitz Elron Bandel Leshem Choshen Michal Shmueli-Scheuer Gabriel Stanovsky 275 10 0 03 Mar 2025
Aligning Black-box Language Models with Human JudgmentsNorth American Chapter of the Association for Computational Linguistics (NAACL), 2025 Gerrit J. J. van den Burg Gen Suzuki Wei Liu Murat Sensoy ALM 239 2 0 07 Feb 2025
Improving Model Evaluation using SMART Filtering of Benchmark DatasetsNorth American Chapter of the Association for Computational Linguistics (NAACL), 2024 Vipul Gupta Candace Ross David Pantoja R. Passonneau Megan Ung Adina Williams 642 11 0 26 Oct 2024
Mitigating Selection Bias with Node Pruning and Auxiliary OptionsAnnual Meeting of the Association for Computational Linguistics (ACL), 2024 Hyeong Kyu Choi Weijie Xu Chi Xue Stephanie Eckman Chandan K. Reddy 362 10 0 27 Sep 2024
Self-Recognition in Language Models Tim R. Davidson Viacheslav Surkov V. Veselovsky Giuseppe Russo Robert West Çağlar Gülçehre PILM 475 8 0 09 Jul 2024