192
v1v2 (latest)

Vision-based Manipulation from Single Human Video with Open-World Object Graphs

Main:15 Pages
7 Figures
Bibliography:4 Pages
1 Tables
Appendix:8 Pages
Abstract

This work presents an object-centric approach to learning vision-based manipulation skills from human videos. We investigate the problem of robot manipulation via imitation in the open-world setting, where a robot learns to manipulate novel objects from a single video demonstration. We introduce ORION, an algorithm that tackles the problem by extracting an object-centric manipulation plan from a single RGB or RGB-D video and deriving a policy that conditions on the extracted plan. Our method enables the robot to learn from videos captured by daily mobile devices and to generalize the policies to deployment environments with varying visual backgrounds, camera angles, spatial layouts, and novel object instances. We systematically evaluate our method on both short-horizon and long-horizon tasks, using RGB-D and RGB-only demonstration videos. Across varied tasks and demonstration types (RGB-D / RGB), we observe an average success rate of 74.4%, demonstrating the efficacy of ORION in learning from a single human video in the open world. Additional materials can be found on our project website:this https URL.

View on arXiv
Comments on this paper

We use cookies and other tracking technologies to improve your browsing experience on our website, to show you personalized content and targeted ads, to analyze our website traffic, and to understand where our visitors are coming from. See our policy.