387

MoCap-guided Data Augmentation for 3D Pose Estimation in the Wild

Neural Information Processing Systems (NeurIPS), 2016
Abstract

In this paper, we address the problem of 3D human pose understanding in the wild. A significant challenge is the lack of training data, i.e., 2D images of humans annotated with 3D pose. Such data is necessary to train state-of-the-art CNN architectures. Here, we propose a solution to generate a large set of photorealistic synthetic images of humans with 3D pose annotations. We introduce an image-based synthesis engine that artificially augments a dataset of real images and 2D human pose annotations using 3D Motion Capture (MoCap) data. Given a candidate 3D pose, our algorithm selects for each joint an image whose 2D pose locally matches the projected 3D pose. The selected images are then combined to generate a new synthetic image by stitching local image patches in a kinematically constrained manner. The resulting images are used to train an end-to-end CNN for full-body 3D pose estimation. We cluster the training data into a large number of pose classes and tackle pose estimation as a K-way classification problem. Such approach is viable only with large training sets such as ours. Our method outperforms state-of-the-art in terms of 3D pose estimation in controlled environments (Human3.6M), showing promising results for in-the-wild images (LSP).

View on arXiv
Comments on this paper