Metrics for Exposing the Biases of Content-Style Disentanglement

British Machine Vision Conference (BMVC), 2020

27 August 2020

Sotirios A. Tsaftaris

CoGe

DRL

ArXiv (abs)PDF HTML

Abstract

A recent spate of state-of-the-art semi- and unsupervised solutions for challenging computer vision tasks encode image "content" into a spatial tensor and image appearance or "style" into a vector. Most of these solutions use the term disentangled for their representations and employ different "biases" such as model design, learning objectives, and data, to achieve good performance in spatially equivariant tasks (e.g. image-to-image translation). While considerable effort has been made to measure disentanglement in vector representations, we have lacked metrics for spatial content and vector style representations. In this paper, we propose such metrics to characterize the degree of disentanglement in terms of how (un)correlated and informative the content and style representations are, and we further examine its relation to task performance. In particular, we first identify key design choices and learning constraints on three popular models that employ content-style disentanglement and derive ablated versions. Secondly, we use our metrics to ascertain the role of each bias. Our experiments reveal a "sweet spot" between disentanglement, task performance and latent space interpretability. Our metrics are not task-dependent; thus, they can help guide either the design of new future models or the selection of viable models such that this ideal "sweet spot" is achieved in any task where content-style representations are useful.

View on arXiv

Comments on this paper