ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2405.20678
70
3

No-Regret Learning for Fair Multi-Agent Social Welfare Optimization

31 May 2024
Mengxiao Zhang
Ramiro Deo-Campo Vuong
Haipeng Luo
ArXiv (abs)PDFHTML
Abstract

We consider the problem of online multi-agent Nash social welfare (NSW) maximization. While previous works of Hossain et al. [2021], Jones et al. [2023] study similar problems in stochastic multi-agent multi-armed bandits and show that T\sqrt{T}T​-regret is possible after TTT rounds, their fairness measure is the product of all agents' rewards, instead of their NSW (that is, their geometric mean). Given the fundamental role of NSW in the fairness literature, it is more than natural to ask whether no-regret fair learning with NSW as the objective is possible. In this work, we provide a complete answer to this question in various settings. Specifically, in stochastic NNN-agent KKK-armed bandits, we develop an algorithm with O~(K2NTN−1N)\widetilde{\mathcal{O}}\left(K^{\frac{2}{N}}T^{\frac{N-1}{N}}\right)O(KN2​TNN−1​) regret and prove that the dependence on TTT is tight, making it a sharp contrast to the T\sqrt{T}T​-regret bounds of Hossain et al. [2021], Jones et al. [2023]. We then consider a more challenging version of the problem with adversarial rewards. Somewhat surprisingly, despite NSW being a concave function, we prove that no algorithm can achieve sublinear regret. To circumvent such negative results, we further consider a setting with full-information feedback and design two algorithms with T\sqrt{T}T​-regret: the first one has no dependence on NNN at all and is applicable to not just NSW but a broad class of welfare functions, while the second one has better dependence on KKK and is preferable when NNN is small. Finally, we also show that logarithmic regret is possible whenever there exists one agent who is indifferent about different arms.

View on arXiv
Comments on this paper