ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2505.04650
22
0

Multimodal Benchmarking and Recommendation of Text-to-Image Generation Models

6 May 2025
Kapil Wanaskar
Gaytri Jena
Magdalini Eirinaki
    EGVM
ArXivPDFHTML
Abstract

This work presents an open-source unified benchmarking and evaluation framework for text-to-image generation models, with a particular focus on the impact of metadata augmented prompts. Leveraging the DeepFashion-MultiModal dataset, we assess generated outputs through a comprehensive set of quantitative metrics, including Weighted Score, CLIP (Contrastive Language Image Pre-training)-based similarity, LPIPS (Learned Perceptual Image Patch Similarity), FID (Frechet Inception Distance), and retrieval-based measures, as well as qualitative analysis. Our results demonstrate that structured metadata enrichments greatly enhance visual realism, semantic fidelity, and model robustness across diverse text-to-image architectures. While not a traditional recommender system, our framework enables task-specific recommendations for model selection and prompt design based on evaluation metrics.

View on arXiv
@article{wanaskar2025_2505.04650,
  title={ Multimodal Benchmarking and Recommendation of Text-to-Image Generation Models },
  author={ Kapil Wanaskar and Gaytri Jena and Magdalini Eirinaki },
  journal={arXiv preprint arXiv:2505.04650},
  year={ 2025 }
}
Comments on this paper