Geometry-Based Next Frame Prediction from Monocular Video

20 September 2016

Abstract

We propose a method for next frame prediction from video input. A convolutional recurrent neural network is trained to predict depth from monocular video input, which, along with the current video image and the camera trajectory, can then be used to compute the next frame. Unlike prior next-frame prediction approaches, we take advantage of the scene geometry and use the predicted depth for generating next frame prediction. A useful side effect of our technique is that it produces depth from video, which can be used in other applications. We evaluate the proposed approach on the KITTI raw dataset, which is collected from a vehicle moving through urban environments. The results are compared with the state-of-the-art models for next frame prediction. We show that our method produces visually and numerically superior results to existing methods that directly predict the next frame.

View on arXiv

Comments on this paper