v1v2 (latest)

Merging and Disentangling Views in Visual Reinforcement Learning for Robotic Manipulation

7 May 2025

Abdulaziz Almuzairee

Rohan Patil

Dwait Bhatt

Henrik I. Christensen

ArXiv (abs)PDF HTML Github

Main:9 Pages

16 Figures

Bibliography:5 Pages

1 Tables

Appendix:9 Pages

Abstract

Vision is well-known for its use in manipulation, especially using visual servoing. Due to the 3D nature of the world, using multiple camera views and merging them creates better representations for Q-learning and in turn, trains more sample efficient policies. Nevertheless, these multi-view policies are sensitive to failing cameras and can be burdensome to deploy. To mitigate these issues, we introduce a Merge And Disentanglement (MAD) algorithm that efficiently merges views to increase sample efficiency while simultaneously disentangling views by augmenting multi-view feature inputs with single-view features. This produces robust policies and allows lightweight deployment. We demonstrate the efficiency and robustness of our approach using Meta-World and ManiSkill3. For project website and code, seethis https URL

View on arXiv

Comments on this paper