HypRL: Reinforcement Learning of Control Policies for Hyperproperties

7 April 2025

Abstract

We study the problem of learning control policies for complex tasks whose requirements are given by a hyperproperty. The use of hyperproperties is motivated by their significant power to formally specify requirements of multi-agent systems as well as those that need expressiveness in terms of multiple execution traces (e.g., privacy and fairness). Given a Markov decision process M with unknown transitions (representing the environment) and a HyperLTL formula $\varphi$ , our approach first employs Skolemization to handle quantifier alternations in $\varphi$ . We introduce quantitative robustness functions for HyperLTL to define rewards of finite traces of M with respect to $\varphi$ . Finally, we utilize a suitable reinforcement learning algorithm to learn (1) a policy per trace quantifier in $\varphi$ , and (2) the probability distribution of transitions of M that together maximize the expected reward and, hence, probability of satisfaction of $\varphi$ in M. We present a set of case studies on (1) safety-preserving multi-agent path planning, (2) fairness in resource allocation, and (3) the post-correspondence problem (PCP).

View on arXiv

@article{hsu2025_2504.04675,
  title={ HypRL: Reinforcement Learning of Control Policies for Hyperproperties },
  author={ Tzu-Han Hsu and Arshia Rafieioskouei and Borzoo Bonakdarpour },
  journal={arXiv preprint arXiv:2504.04675},
  year={ 2025 }
}

Comments on this paper