GAN inversion and editing via StyleGAN maps an input image into the embedding spaces (, , and ) to simultaneously maintain image fidelity and meaningful manipulation. From latent space to extended latent space to feature space in StyleGAN, the editability of GAN inversion decreases while its reconstruction quality increases. Recent GAN inversion methods typically explore and rather than to improve reconstruction fidelity while maintaining editability. As and are derived from that is essentially the foundation latent space of StyleGAN, these GAN inversion methods focusing on and spaces could be improved by stepping back to . In this work, we propose to first obtain the precise latent code in foundation latent space . We introduce contrastive learning to align and the image space for precise latent code discovery. %The obtaining process is by using contrastive learning to align and the image space. Then, we leverage a cross-attention encoder to transform the obtained latent code in into and , accordingly. Our experiments show that our exploration of the foundation latent space improves the representation ability of latent codes in and features in , which yields state-of-the-art reconstruction fidelity and editability results on the standard benchmarks. Project page: https://kumapowerliu.github.io/CLCAE.
View on arXiv