MF-VITON: High-Fidelity Mask-Free Virtual Try-On with Minimal Input

11 March 2025

Abstract

Recent advancements in Virtual Try-On (VITON) have significantly improved image realism and garment detail preservation, driven by powerful text-to-image (T2I) diffusion models. However, existing methods often rely on user-provided masks, introducing complexity and performance degradation due to imperfect inputs, as shown in Fig.1(a). To address this, we propose a Mask-Free VITON (MF-VITON) framework that achieves realistic VITON using only a single person image and a target garment, eliminating the requirement for auxiliary masks. Our approach introduces a novel two-stage pipeline: (1) We leverage existing Mask-based VITON models to synthesize a high-quality dataset. This dataset contains diverse, realistic pairs of person images and corresponding garments, augmented with varied backgrounds to mimic real-world scenarios. (2) The pre-trained Mask-based model is fine-tuned on the generated dataset, enabling garment transfer without mask dependencies. This stage simplifies the input requirements while preserving garment texture and shape fidelity. Our framework achieves state-of-the-art (SOTA) performance regarding garment transfer accuracy and visual realism. Notably, the proposed Mask-Free model significantly outperforms existing Mask-based approaches, setting a new benchmark and demonstrating a substantial lead over previous approaches. For more details, visit our project page:this https URL.

View on arXiv

@article{wan2025_2503.08650,
  title={ MF-VITON: High-Fidelity Mask-Free Virtual Try-On with Minimal Input },
  author={ Zhenchen Wan and Yanwu xu and Dongting Hu and Weilun Cheng and Tianxi Chen and Zhaoqing Wang and Feng Liu and Tongliang Liu and Mingming Gong },
  journal={arXiv preprint arXiv:2503.08650},
  year={ 2025 }
}

Comments on this paper