v1v2 (latest)

Goal Kernel Planning: Linearly-Solvable Non-Markovian Policies for Logical Tasks with Goal-Conditioned Options

6 July 2020

Abstract

In the domain of hierarchical planning, compositionality, abstraction, and task transfer are crucial for designing algorithms that can efficiently solve a variety of problems with maximal representational reuse. Many real-world problems require non-Markovian policies to handle complex structured tasks with logical conditions, often leading to prohibitively large state representations; this requires efficient methods for breaking these problems down and reusing structure between tasks. To this end, we introduce a compositional framework called Linearly-Solvable Goal Kernel Dynamic Programming (LS-GKDP) to address the complexity of solving non-Markovian Boolean sub-goal tasks with ordering constraints. LS-GKDP combines the Linearly-Solvable Markov Decision Process (LMDP) formalism with the Options Framework of Reinforcement Learning. LMDPs can be efficiently solved as a principal eigenvector problem, and options are policies with termination conditions used as temporally extended actions; with LS-GKDP we expand LMDPs to control over options for logical tasks. This involves decomposing a high-dimensional problem down into a set of goal-condition options for each goal and constructing a goal kernel, which is an abstract transition kernel that jumps from an option's initial-states to its termination-states along with an update of the higher-level task-state. We show how an LMDP with a goal kernel enables the efficient optimization of meta-policies in a lower-dimensional subspace defined by the task grounding. Options can also be remapped to new problems within a super-exponential space of tasks without significant recomputation, and we identify cases where the solution is invariant to the task grounding, permitting zero-shot task transfer.

View on arXiv

Comments on this paper