ID policy (with reassignment) is asymptotically optimal for heterogeneous weakly-coupled MDPs

9 February 2025

Abstract

Heterogeneity poses a fundamental challenge for many real-world large-scale decision-making problems but remains largely understudied. In this paper, we study the fully heterogeneous setting of a prominent class of such problems, known as weakly-coupled Markov decision processes (WCMDPs). Each WCMDP consists of $N$ arms (or subproblems), which have distinct model parameters in the fully heterogeneous setting, leading to the curse of dimensionality when $N$ is large. We show that, under mild assumptions, a natural adaptation of the ID policy, although originally proposed for a homogeneous special case of WCMDPs, in fact achieves an $O(1/\sqrt{N})$ optimality gap in long-run average reward per arm for fully heterogeneous WCMDPs as $N$ becomes large. This is the first asymptotic optimality result for fully heterogeneous average-reward WCMDPs. Our techniques highlight the construction of a novel projection-based Lyapunov function, which witnesses the convergence of rewards and costs to an optimal region in the presence of heterogeneity.

View on arXiv

@article{zhang2025_2502.06072,
  title={ ID policy (with reassignment) is asymptotically optimal for heterogeneous weakly-coupled MDPs },
  author={ Xiangcheng Zhang and Yige Hong and Weina Wang },
  journal={arXiv preprint arXiv:2502.06072},
  year={ 2025 }
}

Comments on this paper