Computational Hardness of Reinforcement Learning with Partial $q^π$ -Realizability

24 October 2025

Shayan Karimi

Xiaoqi Tan

ArXiv (abs)PDF HTML

Main:9 Pages

1 Figures

Bibliography:2 Pages

4 Tables

Appendix:13 Pages

Abstract

This paper investigates the computational complexity of reinforcement learning in a novel linear function approximation regime, termed partial $q^{\pi}$ -realizability. In this framework, the objective is to learn an $\epsilon$ -optimal policy with respect to a predefined policy set $\Pi$ , under the assumption that all value functions for policies in $\Pi$ are linearly realizable. The assumptions of this framework are weaker than those in $q^{\pi}$ -realizability but stronger than those in $q^*$ -realizability, providing a practical model where function approximation naturally arises. We prove that learning an $\epsilon$ -optimal policy in this setting is computationally hard. Specifically, we establish NP-hardness under a parameterized greedy policy set (argmax) and show that - unless NP = RP - an exponential lower bound (in feature vector dimension) holds when the policy set contains softmax policies, under the Randomized Exponential Time Hypothesis. Our hardness results mirror those in $q^*$ -realizability and suggest computational difficulty persists even when $\Pi$ is expanded beyond the optimal policy. To establish this, we reduce from two complexity problems, $\delta$ -Max-3SAT and $\delta$ -Max-3SAT(b), to instances of GLinear- $\kappa$ -RL (greedy policy) and SLinear- $\kappa$ -RL (softmax policy). Our findings indicate that positive computational results are generally unattainable in partial $q^{\pi}$ -realizability, in contrast to $q^{\pi}$ -realizability under a generative access model.

View on arXiv

Comments on this paper

Computational Hardness of Reinforcement Learning with Partial qπq^πqπ-Realizability

Computational Hardness of Reinforcement Learning with Partial $q^π$ -Realizability