OPAL: Encoding Causal Understanding of Physical Systems for Robot Learning

9 April 2025

Daniel Tcheurekdjian

ArXiv (abs)PDF HTML Github

Main:12 Pages

2 Figures

Bibliography:2 Pages

3 Tables

Appendix:5 Pages

Abstract

We present OPAL (Operant Physical Agent with Language), a novel vision-language-action architecture that introduces topological constraints to flow matching for robotic control. To do so, we further introduce topological attention. Our approach models action sequences as topologically-structured representations with non-trivial constraints. Experimental results across 10 complex manipulation tasks demonstrate OPAL's superior performance compared to previous approaches, including Octo, OpenVLA, and ${\pi}$ 0.

View on arXiv

Comments on this paper