GraspCorrect: Robotic Grasp Correction via Vision-Language Model-Guided Feedback

19 March 2025

Sungjae Lee

Yeonjoo Hong

Kwang In KIm

ArXiv (abs)PDF HTML

Main:8 Pages

8 Figures

Bibliography:2 Pages

6 Tables

Appendix:4 Pages

Abstract

Despite significant advancements in robotic manipulation, achieving consistent and stable grasping remains a fundamental challenge, often limiting the successful execution of complex tasks. Our analysis reveals that even state-of-the-art policy models frequently exhibit unstable grasping behaviors, leading to failure cases that create bottlenecks in real-world robotic applications. To address these challenges, we introduce GraspCorrect, a plug-and-play module designed to enhance grasp performance through vision-language model-guided feedback. GraspCorrect employs an iterative visual question-answering framework with two key components: grasp-guided prompting, which incorporates task-specific constraints, and object-aware sampling, which ensures the selection of physically feasible grasp candidates. By iteratively generating intermediate visual goals and translating them into joint-level actions, GraspCorrect significantly improves grasp stability and consistently enhances task success rates across existing policy models in the RLBench and CALVIN datasets.

View on arXiv

Comments on this paper