Playing for Depth

Estimating the relative depth of a scene is a significant step towards understanding the general structure of the depicted scenery, the relations of entities in the scene and their interactions. When faced with the task of estimating depth without the use of Stereo images, we are dependent on the availability of large-scale depth datasets and high-capacity models to capture the intrinsic nature of depth. Unfortunately, creating datasets of depth images is not a trivial task as the requirements for the camera mainly limits us to areas where we can provide the necessities for the camera to work. In this work, we present a new depth dataset captured from Video Games in an easy and reproducible way. The nature of open-world video games gives us the ability to capture high-quality depth maps in the wild without the constrictions of Stereo cameras. Experiments on this dataset shows that using such synthetic datasets increases the accuracy of Monocular Depth Estimation in the wild where other approaches usually fail to generalize.
View on arXiv