Learn to Grasp with Less Supervision: A Data-Efficient Maximum
Likelihood Grasp Sampling Loss
Robotic grasping for a diverse set of objects is essential in many robot manipulation tasks. One promising approach is to learn deep grasping models from training datasets of object images and grasp labels. Approaches in this category require millions of data to train deep models. However, empirical grasping datasets typically consist of sparsely labeled samples (i.e., a limited number of successful grasp labels in each image). This paper proposes a Maximum Likelihood Grasp Sampling Loss (MLGSL) to tackle the data sparsity issue. The proposed method supposes that successful grasp labels are sampled from a ground-truth grasp distribution and aims to recover the ground-truth map. MLGSL is utilized for training a fully convolutional network that detects thousands of grasps simultaneously. Training results suggest that models based on MLGSL can learn to grasp with datasets composing of 2 labels per image, which implies that it is 8x more data-efficient than current state-of-the-art techniques. Meanwhile, physical robot experiments demonstrate an equivalent performance in detecting robust grasps at a 91.8% grasp success rate on household objects.
View on arXiv