A Controllable Appearance Representation for Flexible Transfer and Editing

21 April 2025

Abstract

We present a method that computes an interpretable representation of material appearance within a highly compact, disentangled latent space. This representation is learned in a self-supervised fashion using an adapted FactorVAE. We train our model with a carefully designed unlabeled dataset, avoiding possible biases induced by human-generated labels. Our model demonstrates strong disentanglement and interpretability by effectively encoding material appearance and illumination, despite the absence of explicit supervision. Then, we use our representation as guidance for training a lightweight IP-Adapter to condition a diffusion pipeline that transfers the appearance of one or more images onto a target geometry, and allows the user to further edit the resulting appearance. Our approach offers fine-grained control over the generated results: thanks to the well-structured compact latent space, users can intuitively manipulate attributes such as hue or glossiness in image space to achieve the desired final appearance.

View on arXiv

@article{jimenez-navarro2025_2504.15028,
  title={ A Controllable Appearance Representation for Flexible Transfer and Editing },
  author={ Santiago Jimenez-Navarro and Julia Guerrero-Viu and Belen Masia },
  journal={arXiv preprint arXiv:2504.15028},
  year={ 2025 }
}

Comments on this paper