We introduce SPOT, an object-centric imitation learning framework. The key idea is to capture each task by an object-centric representation, specifically the SE(3) object pose trajectory relative to the target. This approach decouples embodiment actions from sensory inputs, facilitating learning from various demonstration types, including both action-based and action-less human hand demonstrations, as well as cross-embodiment generalization. Additionally, object pose trajectories inherently capture planning constraints from demonstrations without the need for manually-crafted rules. To guide the robot in executing the task, the object trajectory is used to condition a diffusion policy. We systematically evaluate our method on simulation and real-world tasks. In real-world evaluation, using only eight demonstrations shot on an iPhone, our approach completed all tasks while fully complying with task constraints. Project page:this https URL
View on arXiv@article{hsu2025_2411.00965, title={ SPOT: SE(3) Pose Trajectory Diffusion for Object-Centric Manipulation }, author={ Cheng-Chun Hsu and Bowen Wen and Jie Xu and Yashraj Narang and Xiaolong Wang and Yuke Zhu and Joydeep Biswas and Stan Birchfield }, journal={arXiv preprint arXiv:2411.00965}, year={ 2025 } }