BiFold: Bimanual Cloth Folding with Language Guidance

IEEE International Conference on Robotics and Automation (ICRA), 2025

27 January 2025

Oriol Barbany

Adrià Colomé

Carme Torras

ArXiv (abs)PDF HTML HuggingFace (1 upvotes)

Main:6 Pages

28 Figures

Bibliography:2 Pages

8 Tables

Appendix:22 Pages

Abstract

Cloth folding is a complex task due to the inevitable self-occlusions of clothes, their complicated dynamics, and the disparate materials, geometries, and textures that garments can have. In this work, we learn folding actions conditioned on text commands. Translating high-level, abstract instructions into precise robotic actions requires sophisticated language understanding and manipulation capabilities. To do that, we leverage a pre-trained vision-language model and repurpose it to predict manipulation actions. Our model, BiFold, can take context into account and achieves state-of-the-art performance on an existing language-conditioned folding benchmark. Given the lack of annotated bimanual folding data, we devise a procedure to automatically parse actions of a simulated dataset and tag them with aligned text instructions. BiFold attains the best performance on our dataset and can transfer to new instructions, garments, and environments.

View on arXiv

Comments on this paper