v1v2 (latest)

Learning Disentangled Representations of Video with Missing Data

Neural Information Processing Systems (NeurIPS), 2020

23 June 2020

Armand Comas Massague

ArXiv (abs)PDF HTML Github (4★)

Abstract

Missing data poses significant challenges while learning representations of video sequences. We present Disentangled Imputed Video autoEncoder (DIVE), a deep generative model that imputes and predicts future video frames in the presence of missing data. Specifically, DIVE introduces a missingness latent variable, disentangles the hidden video representations into static and dynamic appearance, pose, and missingness factors for each object. DIVE imputes each object's trajectory where data is missing. On a moving MNIST dataset with various missing scenarios, DIVE outperforms the state of the art baselines by a substantial margin. We also present comparisons for real-world MOTSChallenge pedestrian dataset, which demonstrates the practical value of our method in a more realistic setting. Our code and data can be found at https://github.com/Rose-STL-Lab/DIVE.

View on arXiv

Comments on this paper