Deep Reinforcement Learning with Surrogate Agent-Environment Interface
Abstract
In this paper we propose surrogate agent-environment interface (SAEI) in rein-forcement learning. We also state that learning based on probability surrogate agent-environment interface gives optimal policy of task agent-environment interface. We introduce surrogate probability action and develope the probability surrogate action deterministic policy gradient (PSADPG) algorithm based on SAEI. This algorithm enables continuous control of discrete action. The experiments show PSADPG achieves the performance of DQN in certain tasks with the stochastic optimal policy nature in the initial training stage.
View on arXivComments on this paper
