The demand for high-quality synthetic data for model training and augmentation has never been greater in medical imaging. However, current evaluations predominantly rely on computational metrics that fail to align with human expert recognition. This leads to synthetic images that may appear realistic numerically but lack clinical authenticity, posing significant challenges in ensuring the reliability and effectiveness of AI-driven medical tools. To address this gap, we introduce GazeVal, a practical framework that synergizes expert eye-tracking data with direct radiological evaluations to assess the quality of synthetic medical images. GazeVal leverages gaze patterns of radiologists as they provide a deeper understanding of how experts perceive and interact with synthetic data in different tasks (i.e., diagnostic or Turing tests). Experiments with sixteen radiologists revealed that 96.6% of the generated images (by the most recent state-of-the-art AI algorithm) were identified as fake, demonstrating the limitations of generative AI in producing clinically accurate images.
View on arXiv@article{wong2025_2503.20967, title={ Eyes Tell the Truth: GazeVal Highlights Shortcomings of Generative AI in Medical Imaging }, author={ David Wong and Bin Wang and Gorkem Durak and Marouane Tliba and Akshay Chaudhari and Aladine Chetouani and Ahmet Enis Cetin and Cagdas Topel and Nicolo Gennaro and Camila Lopes Vendrami and Tugce Agirlar Trabzonlu and Amir Ali Rahsepar and Laetitia Perronne and Matthew Antalek and Onural Ozturk and Gokcan Okur and Andrew C. Gordon and Ayis Pyrros and Frank H. Miller and Amir Borhani and Hatice Savas and Eric Hart and Drew Torigian and Jayaram K. Udupa and Elizabeth Krupinski and Ulas Bagci }, journal={arXiv preprint arXiv:2503.20967}, year={ 2025 } }