Language models are prone to dataset biases, known as shortcuts and spurious correlations in data, which often result in performance drop on new data. We present a new debiasing framework called ``FairFlow'' that mitigates dataset biases by learning to be undecided in its predictions for data samples or representations associated with known or unknown biases. The framework introduces two key components: a suite of data and model perturbation operations that generate different biased views of input samples, and a contrastive objective that learns debiased and robust representations from the resulting biased views of samples. Experiments show that FairFlow outperforms existing debiasing methods, particularly against out-of-domain and hard test samples without compromising the in-domain performance
View on arXiv@article{cheng2025_2503.17632, title={ FairFlow: Mitigating Dataset Biases through Undecided Learning }, author={ Jiali Cheng and Hadi Amiri }, journal={arXiv preprint arXiv:2503.17632}, year={ 2025 } }