31
0

HypRL: Reinforcement Learning of Control Policies for Hyperproperties

Abstract

We study the problem of learning control policies for complex tasks whose requirements are given by a hyperproperty. The use of hyperproperties is motivated by their significant power to formally specify requirements of multi-agent systems as well as those that need expressiveness in terms of multiple execution traces (e.g., privacy and fairness). Given a Markov decision process M with unknown transitions (representing the environment) and a HyperLTL formula φ\varphi, our approach first employs Skolemization to handle quantifier alternations in φ\varphi. We introduce quantitative robustness functions for HyperLTL to define rewards of finite traces of M with respect to φ\varphi. Finally, we utilize a suitable reinforcement learning algorithm to learn (1) a policy per trace quantifier in φ\varphi, and (2) the probability distribution of transitions of M that together maximize the expected reward and, hence, probability of satisfaction of φ\varphi in M. We present a set of case studies on (1) safety-preserving multi-agent path planning, (2) fairness in resource allocation, and (3) the post-correspondence problem (PCP).

View on arXiv
@article{hsu2025_2504.04675,
  title={ HypRL: Reinforcement Learning of Control Policies for Hyperproperties },
  author={ Tzu-Han Hsu and Arshia Rafieioskouei and Borzoo Bonakdarpour },
  journal={arXiv preprint arXiv:2504.04675},
  year={ 2025 }
}
Comments on this paper