ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2504.15182
40
1

Tiger200K: Manually Curated High Visual Quality Video Dataset from UGC Platform

21 April 2025
Xianpan Zhou
    VGen
ArXivPDFHTML
Abstract

The recent surge in open-source text-to-video generation models has significantly energized the research community, yet their dependence on proprietary training datasets remains a key constraint. While existing open datasets like Koala-36M employ algorithmic filtering of web-scraped videos from early platforms, they still lack the quality required for fine-tuning advanced video generation models. We present Tiger200K, a manually curated high visual quality video dataset sourced from User-Generated Content (UGC) platforms. By prioritizing visual fidelity and aesthetic quality, Tiger200K underscores the critical role of human expertise in data curation, and providing high-quality, temporally consistent video-text pairs for fine-tuning and optimizing video generation architectures through a simple but effective pipeline including shot boundary detection, OCR, border detecting, motion filter and fine bilingual caption. The dataset will undergo ongoing expansion and be released as an open-source initiative to advance research and applications in video generative models. Project page:this https URL

View on arXiv
@article{zhou2025_2504.15182,
  title={ Tiger200K: Manually Curated High Visual Quality Video Dataset from UGC Platform },
  author={ Xianpan Zhou },
  journal={arXiv preprint arXiv:2504.15182},
  year={ 2025 }
}
Comments on this paper