Bandit Max-Min Fair Allocation

In this paper, we study a new decision-making problem called the bandit max-min fair allocation (BMMFA) problem. The goal of this problem is to maximize the minimum utility among agents with additive valuations by repeatedly assigning indivisible goods to them. One key feature of this problem is that each agent's valuation for each item can only be observed through the semi-bandit feedback, while existing work supposes that the item values are provided at the beginning of each round. Another key feature is that the algorithm's reward function is not additive with respect to rounds, unlike most bandit-setting problems.Our first contribution is to propose an algorithm that has an asymptotic regret bound of , where is the number of agents, is the number of items, and is the time horizon. This is based on a novel combination of bandit techniques and a resource allocation algorithm studied in the literature on competitive analysis. Our second contribution is to provide the regret lower bound of . When is sufficiently larger than , the gap between the upper and lower bounds is a logarithmic factor of .
View on arXiv@article{harada2025_2505.05169, title={ Bandit Max-Min Fair Allocation }, author={ Tsubasa Harada and Shinji Ito and Hanna Sumita }, journal={arXiv preprint arXiv:2505.05169}, year={ 2025 } }