ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2404.10220
25
28

Closed-Loop Open-Vocabulary Mobile Manipulation with GPT-4V

16 April 2024
Peiyuan Zhi
Zhiyuan Zhang
Muzhi Han
Zeyu Zhang
Zhitian Li
Ziyuan Jiao
Ziyuan Jiao
Siyuan Huang
Siyuan Huang
    LRM
    LM&Ro
ArXivPDFHTML
Abstract

Autonomous robot navigation and manipulation in open environments require reasoning and replanning with closed-loop feedback. In this work, we present COME-robot, the first closed-loop robotic system utilizing the GPT-4V vision-language foundation model for open-ended reasoning and adaptive planning in real-worldthis http URL-robot incorporates two key innovative modules: (i) a multi-level open-vocabulary perception and situated reasoning module that enables effective exploration of the 3D environment and target object identification using commonsense knowledge and situated information, and (ii) an iterative closed-loop feedback and restoration mechanism that verifies task feasibility, monitors execution success, and traces failure causes across different modules for robust failure recovery. Through comprehensive experiments involving 8 challenging real-world mobile and tabletop manipulation tasks, COME-robot demonstrates a significant improvement in task success rate (~35%) compared to state-of-the-art methods. We further conduct comprehensive analyses to elucidate how COME-robot's design facilitates failure recovery, free-form instruction following, and long-horizon task planning.

View on arXiv
@article{zhi2025_2404.10220,
  title={ Closed-Loop Open-Vocabulary Mobile Manipulation with GPT-4V },
  author={ Peiyuan Zhi and Zhiyuan Zhang and Yu Zhao and Muzhi Han and Zeyu Zhang and Zhitian Li and Ziyuan Jiao and Baoxiong Jia and Siyuan Huang },
  journal={arXiv preprint arXiv:2404.10220},
  year={ 2025 }
}
Comments on this paper