Efficient and Interpretable Robot Manipulation with Graph Neural
Networks
- LM&Ro
Manipulation tasks like loading a dishwasher can be seen as a sequence of spatial constraints and relationships between different objects. For example, a plate can be placed in a tray only if the tray is open. We aim to discover such task-specific rules from demonstrations. We pose manipulation as a classification problem over a graph, whose nodes represent task relevant entities like objects and goals, transform the environment scene into a graph and learn a graph neural network (GNN) policy using imitation learning. In our experiments, a single learned GNN policy, trained using 20 expert demonstrations, can solve multiple blockstacking and rearrangement tasks in both simulation and on hardware, without any task description. The policy successfully generalizes over the number of objects in the environment, their positions, and goal configurations (trained on single stacks, generalizes to pyramids and multiple stacks). We also apply our approach to a complex simulated dishwasher environment, where a robot learns to load a dishwasher from only 5 high-level human demonstrations. These experiments show that imitation learning on a graphical state and policy is a simple, yet powerful tool for solving complex long-horizon manipulation problems, without requiring detailed task descriptions. Videos can be found at: https://youtu.be/x9hcKBh6K0A.
View on arXiv