Beyond Performance: Quantifying and Mitigating Label Bias in LLMs

Beyond Performance: Quantifying and Mitigating Label Bias in LLMs

North American Chapter of the Association for Computational Linguistics (NAACL), 2024

4 May 2024

ArXiv (abs)PDF HTML

Papers citing "Beyond Performance: Quantifying and Mitigating Label Bias in LLMs"

14 / 14 papers shown

Title
Improving Score Reliability of Multiple Choice Benchmarks with Consistency Evaluation and Altered Answer Choices Paulo Cavalin Cassia Sanctos Marcelo Grave Claudio S. Pinhanez Yago Primerano 8 0 0 26 Nov 2025
Quantifying and Mitigating Selection Bias in LLMs: A Transferable LoRA Fine-Tuning and Efficient Majority Voting Approach Blessed Guda Lawrence Francis Gabrial Zencha A. Carlee Joe-Wong Moise Busogi 8 0 0 17 Nov 2025
Person-Centric Annotations of LAION-400M: Auditing Bias and Its Transfer to Models Leander Girrbach Stephan Alaniz Genevieve Smith Trevor Darrell Zeynep Akata 181 1 0 04 Oct 2025
Hearing the Order: Investigating Selection Bias in Large Audio-Language Models Yu-Xiang Lin Chen-An Li Sheng-Lun Wei Po-Chun Chen Hsin-Hsi Chen Hung-yi Lee 108 0 0 01 Oct 2025
Metric assessment protocol in the context of answer fluctuation on MCQ tasks Ekaterina Goliakova X. Renard Marie-Jeanne Lesot Thibault Laugel Christophe Marsala Marcin Detyniecki 107 0 0 21 Jul 2025
PromptSuite: A Task-Agnostic Framework for Multi-Prompt Generation Eliya Habba Noam Dahan Gili Lior Gabriel Stanovsky LRM 310 1 0 20 Jul 2025
SATA-BENCH: Select All That Apply Benchmark for Multiple Choice Questions Weijie Xu Shixian Cui Xi Fang Chi Xue Stephanie Eckman Chandan K. Reddy ELM 243 4 0 31 May 2025
Through the LLM Looking Glass: A Socratic Probing of Donkeys, Elephants, and Markets Molly Kennedy Ayyoob Imani Timo Spinde Hinrich Schütze 236 0 0 20 Mar 2025
Towards AI-assisted Academic Writing Daniel J. Liebling Malcolm Kane Madeleine Grunde-Mclaughlin Ian J. Lang Subhashini Venugopalan Michael P. Brenner 196 3 0 17 Mar 2025
DOVE: A Large-Scale Multi-Dimensional Predictions Dataset Towards Meaningful LLM EvaluationAnnual Meeting of the Association for Computational Linguistics (ACL), 2025 Eliya Habba Ofir Arviv Itay Itzhak Yotam Perlitz Elron Bandel Leshem Choshen Michal Shmueli-Scheuer Gabriel Stanovsky 311 10 0 03 Mar 2025
Aligning Black-box Language Models with Human JudgmentsNorth American Chapter of the Association for Computational Linguistics (NAACL), 2025 Gerrit J. J. van den Burg Gen Suzuki Wei Liu Murat Sensoy ALM 239 2 0 07 Feb 2025
Improving Model Evaluation using SMART Filtering of Benchmark DatasetsNorth American Chapter of the Association for Computational Linguistics (NAACL), 2024 Vipul Gupta Candace Ross David Pantoja R. Passonneau Megan Ung Adina Williams 650 11 0 26 Oct 2024
Mitigating Selection Bias with Node Pruning and Auxiliary OptionsAnnual Meeting of the Association for Computational Linguistics (ACL), 2024 Hyeong Kyu Choi Weijie Xu Chi Xue Stephanie Eckman Chandan K. Reddy 362 10 0 27 Sep 2024
Self-Recognition in Language Models Tim R. Davidson Viacheslav Surkov V. Veselovsky Giuseppe Russo Robert West Çağlar Gülçehre PILM 475 8 0 09 Jul 2024