HypRL: Reinforcement Learning of Control Policies for Hyperproperties

We study the problem of learning control policies for complex tasks whose requirements are given by a hyperproperty. The use of hyperproperties is motivated by their significant power to formally specify requirements of multi-agent systems as well as those that need expressiveness in terms of multiple execution traces (e.g., privacy and fairness). Given a Markov decision process M with unknown transitions (representing the environment) and a HyperLTL formula , our approach first employs Skolemization to handle quantifier alternations in . We introduce quantitative robustness functions for HyperLTL to define rewards of finite traces of M with respect to . Finally, we utilize a suitable reinforcement learning algorithm to learn (1) a policy per trace quantifier in , and (2) the probability distribution of transitions of M that together maximize the expected reward and, hence, probability of satisfaction of in M. We present a set of case studies on (1) safety-preserving multi-agent path planning, (2) fairness in resource allocation, and (3) the post-correspondence problem (PCP).
View on arXiv@article{hsu2025_2504.04675, title={ HypRL: Reinforcement Learning of Control Policies for Hyperproperties }, author={ Tzu-Han Hsu and Arshia Rafieioskouei and Borzoo Bonakdarpour }, journal={arXiv preprint arXiv:2504.04675}, year={ 2025 } }