What About Taking Policy as Input of Value Function: Policy-extended Value Function Approximator

19 October 2020

Hongyao Tang

Jianye Hao

Abstract

We study Policy-extended Value Function Ap-proximator (PeVFA) in Reinforcement Learning(RL), which extends the conventional value func-tion to take as input not only the state (and ac-tion) but also an explicit policy representation.Such an extension enables PeVFA to preserve val-ues of multiple policies in contrast to the conven-tional VFA for only one policy. This brings a newcharacteristic ofvalue generalization among poli-cies. From both theoretical and empirical lens,we focus on value generalization along policyimprovement path (calledlocal generalization),from which we derive a new form of GeneralizedPolicy Iteration (GPI) with PeVFA. Besides, weintroduce a representation learning framework forRL policy, providing several approaches to learneffective policy embeddings from policy networkparameters or state-action pairs by contrastivelearning and action prediction. In our experi-ments, Proximal Policy Optimization (PPO) re-implemented with PeVFA outperforms its vanillacounterpart in several OpenAI Gym continuouscontrol tasks, which demonstrates the effective-ness of value generalization offered by PeVFAand policy representation learning.

View on arXiv

Comments on this paper