Multimodal Deep Learning for Robust RGB-D Object Recognition

IEEE/RJS International Conference on Intelligent RObots and Systems (IROS), 2015

24 July 2015

Andreas Eitel

Jost Tobias Springenberg

Luciano Spinello

Martin Riedmiller

Wolfram Burgard

ArXiv (abs)PDF HTML

Abstract

Robust object recognition is a crucial ingredient of many robotics applications and a prerequisite for solving challenging tasks such as manipulating or grasping objects. Recently, convolutional neural networks (CNNs) have been shown to outperform conventional computer vision algorithms for object recognition from images in many large-scale recognition tasks. In this paper, we investigate the potential for using CNNs to solve object recognition from combined RGB and depth images. We present a new network architecture with two processing streams that learns descriptive features from both input modalities and can learn to fuse information automatically before classification. We introduce an effective encoding of depth information for CNNs and report state of the art performance on the challenging RGB-D Object dataset, where we achieve a recognition accuracy of 91.3%. Finally, to facilitate usage of our method "in the wild", we present a novel synthetic data augmentation approach for training with depth images that improves recognition accuracy in real-world environments.

View on arXiv

Comments on this paper