ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2409.04410
29
50

Open-MAGVIT2: An Open-Source Project Toward Democratizing Auto-regressive Visual Generation

6 September 2024
Zhuoyan Luo
Fengyuan Shi
Yixiao Ge
Yujiu Yang
Limin Wang
Ying Shan
    VLM
ArXivPDFHTML
Abstract

The Open-MAGVIT2 project produces an open-source replication of Google's MAGVIT-v2 tokenizer, a tokenizer with a super-large codebook (i.e., 2182^{18}218 codes), and achieves the state-of-the-art reconstruction performance on ImageNet and UCF benchmarks. We also provide a tokenizer pre-trained on large-scale data, significantly outperforming Cosmos on zero-shot benchmarks (1.93 vs. 0.78 rFID on ImageNet original resolution). Furthermore, we explore its application in plain auto-regressive models to validate scalability properties, producing a family of auto-regressive image generation models ranging from 300M to 1.5B. To assist auto-regressive models in predicting with a super-large vocabulary, we factorize it into two sub-vocabulary of different sizes by asymmetric token factorization, and further introduce ``next sub-token prediction'' to enhance sub-token interaction for better generation quality. We release all models and codes to foster innovation and creativity in the field of auto-regressive visual generation.

View on arXiv
@article{luo2025_2409.04410,
  title={ Open-MAGVIT2: An Open-Source Project Toward Democratizing Auto-regressive Visual Generation },
  author={ Zhuoyan Luo and Fengyuan Shi and Yixiao Ge and Yujiu Yang and Limin Wang and Ying Shan },
  journal={arXiv preprint arXiv:2409.04410},
  year={ 2025 }
}
Comments on this paper