Coarse-to-Fine 3D Keyframe Transporter

Recent advances in Keyframe Imitation Learning (IL) have enabled learning-based agents to solve a diverse range of manipulation tasks. However, most approaches ignore the rich symmetries in the problem setting and, as a consequence, are sample-inefficient. This work identifies and utilizes the bi-equivariant symmetry within Keyframe IL to design a policy that generalizes to transformations of both the workspace and the objects grasped by the gripper. We make two main contributions: First, we analyze the bi-equivariance properties of the keyframe action scheme and propose a Keyframe Transporter derived from the Transporter Networks, which evaluates actions using cross-correlation between the features of the grasped object and the features of the scene. Second, we propose a computationally efficient coarse-to-fine SE(3) action evaluation scheme for reasoning the intertwined translation and rotation action. The resulting method outperforms strong Keyframe IL baselines by an average of >10% on a wide range of simulation tasks, and by an average of 55% in 4 physical experiments.
View on arXiv@article{zhu2025_2502.01773, title={ Coarse-to-Fine 3D Keyframe Transporter }, author={ Xupeng Zhu and David Klee and Dian Wang and Boce Hu and Haojie Huang and Arsh Tangri and Robin Walters and Robert Platt }, journal={arXiv preprint arXiv:2502.01773}, year={ 2025 } }