ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2408.16208
19
2

ReXamine-Global: A Framework for Uncovering Inconsistencies in Radiology Report Generation Metrics

29 August 2024
Oishi Banerjee
Agustina Saenz
Kay Wu
Warren Clements
Adil Zia
Dominic Buensalido
H. Kavnoudias
A. Abi-Ghanem
Nour El Ghawi
Cibele Luna
Patricia Castillo
Khaled Al-Surimi
R. Daghistani
Yuh-Min Chen
Heng-sheng Chao
Lars Heiliger
Moon Kim
Johannes Haubold
F. Jonske
Pranav Rajpurkar
ArXivPDFHTML
Abstract

Given the rapidly expanding capabilities of generative AI models for radiology, there is a need for robust metrics that can accurately measure the quality of AI-generated radiology reports across diverse hospitals. We develop ReXamine-Global, a LLM-powered, multi-site framework that tests metrics across different writing styles and patient populations, exposing gaps in their generalization. First, our method tests whether a metric is undesirably sensitive to reporting style, providing different scores depending on whether AI-generated reports are stylistically similar to ground-truth reports or not. Second, our method measures whether a metric reliably agrees with experts, or whether metric and expert scores of AI-generated report quality diverge for some sites. Using 240 reports from 6 hospitals around the world, we apply ReXamine-Global to 7 established report evaluation metrics and uncover serious gaps in their generalizability. Developers can apply ReXamine-Global when designing new report evaluation metrics, ensuring their robustness across sites. Additionally, our analysis of existing metrics can guide users of those metrics towards evaluation procedures that work reliably at their sites of interest.

View on arXiv
Comments on this paper