A Weighted-likelihood framework for class imbalance in Bayesian prediction models

23 April 2025

Abstract

Class imbalance occurs when data used for training classification models has a different number of observations or samples within each category or class. Models built on such data can be biased towards the majority class and have poor predictive performance and generalisation for the minority class. We propose a Bayesian weighted-likelihood (power-likelihood) approach to deal with class imbalance: each observation's likelihood is raised to a weight inversely proportional to its class proportion, with weights normalized to sum to the number of samples. This embeds cost-sensitive learning directly into Bayesian updating and is applicable to binary, multinomial and ordered logistic prediction models. Example models are implemented in Stan, PyMC, andthis http URL, and all code and reproducible scripts are archived on Github:this https URL. This approach is simple to implement and extends naturally to arbitrary error-cost matrices.

View on arXiv

@article{lazic2025_2504.17013,
  title={ A Weighted-likelihood framework for class imbalance in Bayesian prediction models },
  author={ Stanley E. Lazic },
  journal={arXiv preprint arXiv:2504.17013},
  year={ 2025 }
}

Comments on this paper