479
v1v2v3 (latest)

Delving StyleGAN Inversion for Image Editing: A Foundation Latent Space Viewpoint

Computer Vision and Pattern Recognition (CVPR), 2022
Abstract

GAN inversion and editing via StyleGAN maps an input image into the embedding spaces (W\mathcal{W}, W+\mathcal{W^+}, and F\mathcal{F}) to simultaneously maintain image fidelity and meaningful manipulation. From latent space W\mathcal{W} to extended latent space W+\mathcal{W^+} to feature space F\mathcal{F} in StyleGAN, the editability of GAN inversion decreases while its reconstruction quality increases. Recent GAN inversion methods typically explore W+\mathcal{W^+} and F\mathcal{F} rather than W\mathcal{W} to improve reconstruction fidelity while maintaining editability. As W+\mathcal{W^+} and F\mathcal{F} are derived from W\mathcal{W} that is essentially the foundation latent space of StyleGAN, these GAN inversion methods focusing on W+\mathcal{W^+} and F\mathcal{F} spaces could be improved by stepping back to W\mathcal{W}. In this work, we propose to first obtain the precise latent code in foundation latent space W\mathcal{W}. We introduce contrastive learning to align W\mathcal{W} and the image space for precise latent code discovery. %The obtaining process is by using contrastive learning to align W\mathcal{W} and the image space. Then, we leverage a cross-attention encoder to transform the obtained latent code in W\mathcal{W} into W+\mathcal{W^+} and F\mathcal{F}, accordingly. Our experiments show that our exploration of the foundation latent space W\mathcal{W} improves the representation ability of latent codes in W+\mathcal{W^+} and features in F\mathcal{F}, which yields state-of-the-art reconstruction fidelity and editability results on the standard benchmarks. Project page: https://kumapowerliu.github.io/CLCAE.

View on arXiv
Comments on this paper