39
1

MEGA: Masked Generative Autoencoder for Human Mesh Recovery

Abstract

Human Mesh Recovery (HMR) from a single RGB image is a highly ambiguous problem, as an infinite set of 3D interpretations can explain the 2D observation equally well. Nevertheless, most HMR methods overlook this issue and make a single prediction without accounting for this ambiguity. A few approaches generate a distribution of human meshes, enabling the sampling of multiple predictions; however, none of them is competitive with the latest single-output model when making a single prediction. This work proposes a new approach based on masked generative modeling. By tokenizing the human pose and shape, we formulate the HMR task as generating a sequence of discrete tokens conditioned on an input image. We introduce MEGA, a MaskEd Generative Autoencoder trained to recover human meshes from images and partial human mesh token sequences. Given an image, our flexible generation scheme allows us to predict a single human mesh in deterministic mode or to generate multiple human meshes in stochastic mode. Experiments on in-the-wild benchmarks show that MEGA achieves state-of-the-art performance in deterministic and stochastic modes, outperforming single-output and multi-output approaches.

View on arXiv
@article{fiche2025_2405.18839,
  title={ MEGA: Masked Generative Autoencoder for Human Mesh Recovery },
  author={ Guénolé Fiche and Simon Leglaive and Xavier Alameda-Pineda and Francesc Moreno-Noguer },
  journal={arXiv preprint arXiv:2405.18839},
  year={ 2025 }
}
Comments on this paper