39
0

GraspCorrect: Robotic Grasp Correction via Vision-Language Model-Guided Feedback

Abstract

Despite significant advancements in robotic manipulation, achieving consistent and stable grasping remains a fundamental challenge, often limiting the successful execution of complex tasks. Our analysis reveals that even state-of-the-art policy models frequently exhibit unstable grasping behaviors, leading to failure cases that create bottlenecks in real-world robotic applications. To address these challenges, we introduce GraspCorrect, a plug-and-play module designed to enhance grasp performance through vision-language model-guided feedback. GraspCorrect employs an iterative visual question-answering framework with two key components: grasp-guided prompting, which incorporates task-specific constraints, and object-aware sampling, which ensures the selection of physically feasible grasp candidates. By iteratively generating intermediate visual goals and translating them into joint-level actions, GraspCorrect significantly improves grasp stability and consistently enhances task success rates across existing policy models in the RLBench and CALVIN datasets.

View on arXiv
@article{lee2025_2503.15035,
  title={ GraspCorrect: Robotic Grasp Correction via Vision-Language Model-Guided Feedback },
  author={ Sungjae Lee and Yeonjoo Hong and Kwang In Kim },
  journal={arXiv preprint arXiv:2503.15035},
  year={ 2025 }
}
Comments on this paper