MuseFace: Text-driven Face Editing via Diffusion-based Mask Generation Approach

Face editing modifies the appearance of face, which plays a key role in customization and enhancement of personal images. Although much work have achieved remarkable success in text-driven face editing, they still face significant challenges as none of them simultaneously fulfill the characteristics of diversity, controllability and flexibility. To address this challenge, we propose MuseFace, a text-driven face editing framework, which relies solely on text prompt to enable face editing. Specifically, MuseFace integrates a Text-to-Mask diffusion model and a semantic-aware face editing model, capable of directly generating fine-grained semantic masks from text and performing face editing. The Text-to-Mask diffusion model provides \textit{diversity} and \textit{flexibility} to the framework, while the semantic-aware face editing model ensures \textit{controllability} of the framework. Our framework can create fine-grained semantic masks, making precise face editing possible, and significantly enhancing the controllability and flexibility of face editing models. Extensive experiments demonstrate that MuseFace achieves superior high-fidelity performance.
View on arXiv@article{zhang2025_2503.23888, title={ MuseFace: Text-driven Face Editing via Diffusion-based Mask Generation Approach }, author={ Xin Zhang and Siting Huang and Xiangyang Luo and Yifan Xie and Weijiang Yu and Heng Chang and Fei Ma and Fei Yu }, journal={arXiv preprint arXiv:2503.23888}, year={ 2025 } }