67
0

Ensemble Debiasing Across Class and Sample Levels for Fairer Prompting Accuracy

Abstract

Language models are strong few-shot learners and achieve good overall accuracy in text classification tasks, masking the fact that their results suffer from great class accuracy imbalance. We believe that the pursuit of overall accuracy should not come from enriching the strong classes, but from raising up the weak ones. To address the imbalance, we propose a Heaviside step function based ensemble debiasing method, which enables flexible rectifications of in-context learned class probabilities at both class and sample levels. Evaluations with Llama-2-13B on seven text classification benchmarks show that our approach achieves state-of-the-art overall accuracy gains with balanced class accuracies. More importantly, we perform analyses on the resulted probability correction scheme, showing that sample-level corrections are necessary to elevate weak classes. Due to effectively correcting weak classes, our method also brings significant performance gains to a larger model variant, Llama-2-70B, especially on a biomedical domain task, further demonstrating the necessity of ensemble debiasing at both levels.

View on arXiv
@article{lin2025_2503.05157,
  title={ Ensemble Debiasing Across Class and Sample Levels for Fairer Prompting Accuracy },
  author={ Ruixi Lin and Ziqiao Wang and Yang You },
  journal={arXiv preprint arXiv:2503.05157},
  year={ 2025 }
}
Comments on this paper