ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2503.10678
51
0

VRMDiff: Text-Guided Video Referring Matting Generation of Diffusion

11 March 2025
Lehan Yang
Jincen Song
Tianlong Wang
Daiqing Qi
Weili Shi
Yuheng Liu
Sheng Li
    DiffM
    VOS
    VGen
ArXivPDFHTML
Abstract

We propose a new task, video referring matting, which obtains the alpha matte of a specified instance by inputting a referring caption. We treat the dense prediction task of matting as video generation, leveraging the text-to-video alignment prior of video diffusion models to generate alpha mattes that are temporally coherent and closely related to the corresponding semantic instances. Moreover, we propose a new Latent-Constructive loss to further distinguish different instances, enabling more controllable interactive matting. Additionally, we introduce a large-scale video referring matting dataset with 10,000 videos. To the best of our knowledge, this is the first dataset that concurrently contains captions, videos, and instance-level alpha mattes. Extensive experiments demonstrate the effectiveness of our method. The dataset and code are available atthis https URL.

View on arXiv
@article{yang2025_2503.10678,
  title={ VRMDiff: Text-Guided Video Referring Matting Generation of Diffusion },
  author={ Lehan Yang and Jincen Song and Tianlong Wang and Daiqing Qi and Weili Shi and Yuheng Liu and Sheng Li },
  journal={arXiv preprint arXiv:2503.10678},
  year={ 2025 }
}
Comments on this paper