CSEval: A Framework for Evaluating Clinical Semantics in Text-to-Image Generation

12 February 2026

Robert Cronshaw

Konstantinos Vilouras

Junyu Yan

Yuning Du

Feng Chen

Steven McDonagh

Sotirios A. Tsaftaris

DiffM

EGVM

MedIm

ArXiv (abs)PDF HTML Github (67★)

Main:3 Pages

2 Figures

Bibliography:2 Pages

2 Tables

Abstract

Text-to-image generation has been increasingly applied in medical domains for various purposes such as data augmentation and education. Evaluating the quality and clinical reliability of these generated images is essential. However, existing methods mainly assess image realism or diversity, while failing to capture whether the generated images reflect the intended clinical semantics, such as anatomical location and pathology. In this study, we propose the Clinical Semantics Evaluator (CSEval), a framework that leverages language models to assess clinical semantic alignment between the generated images and their conditioning prompts. Our experiments show that CSEval identifies semantic inconsistencies overlooked by other metrics and correlates with expert judgment. CSEval provides a scalable and clinically meaningful complement to existing evaluation methods, supporting the safe adoption of generative models in healthcare.

View on arXiv

Comments on this paper