BinaryHPE: 3D Human Pose and Shape Estimation via Binarization

3D human pose and shape estimation (HPE) aims to reconstruct the 3D human body, face, and hands from a single image. Although powerful deep learning models have achieved accurate estimation in this task, they require enormous memory and computational resources. Consequently, these methods can hardly be deployed on resource-limited edge devices. In this work, we propose BinaryHPE, a novel binarization method designed to estimate the 3D human body, face, and hands parameters efficiently. Specifically, we propose a novel binary backbone called Binarized Dual Residual Network (BiDRN), designed to retain as much full-precision information as possible. Furthermore, we propose the Binarized BoxNet, an efficient sub-network for predicting face and hands bounding boxes, which further reduces model redundancy. Comprehensive quantitative and qualitative experiments demonstrate the effectiveness of BinaryHPE, which has a significant improvement over state-of-the-art binarization algorithms. Moreover, our BinaryHPE achieves comparable performance with the full-precision method Hand4Whole while using only 22.1% parameters and 14.8% operations. We will release all the code and pretrained models.
View on arXiv@article{li2025_2311.14323, title={ BinaryHPE: 3D Human Pose and Shape Estimation via Binarization }, author={ Zhiteng Li and Yulun Zhang and Jing Lin and Haotong Qin and Jinjin Gu and Xin Yuan and Linghe Kong and Xiaokang Yang }, journal={arXiv preprint arXiv:2311.14323}, year={ 2025 } }