ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2402.14252
46
39

Multimodal Healthcare AI: Identifying and Designing Clinically Relevant Vision-Language Applications for Radiology

22 February 2024
Nur Yildirim
Hannah Richardson
Maria T. A. Wetscherek
Junaid Bajwa
Joseph Jacob
Mark A. Pinnock
Stephen Harris
Daniel Coelho De Castro
Shruthi Bannur
Stephanie L. Hyland
Pratik Ghosh
M. Ranjit
Kenza Bouzid
Anton Schwaighofer
Fernando Pérez-García
Harshita Sharma
Ozan Oktay
M. Lungren
Javier Alvarez-Valle
A. Nori
Anja Thieme
    LM&MA
ArXivPDFHTML
Abstract

Recent advances in AI combine large language models (LLMs) with vision encoders that bring forward unprecedented technical capabilities to leverage for a wide range of healthcare applications. Focusing on the domain of radiology, vision-language models (VLMs) achieve good performance results for tasks such as generating radiology findings based on a patient's medical image, or answering visual questions (e.g., 'Where are the nodules in this chest X-ray?'). However, the clinical utility of potential applications of these capabilities is currently underexplored. We engaged in an iterative, multidisciplinary design process to envision clinically relevant VLM interactions, and co-designed four VLM use concepts: Draft Report Generation, Augmented Report Review, Visual Search and Querying, and Patient Imaging History Highlights. We studied these concepts with 13 radiologists and clinicians who assessed the VLM concepts as valuable, yet articulated many design considerations. Reflecting on our findings, we discuss implications for integrating VLM capabilities in radiology, and for healthcare AI more generally.

View on arXiv
Comments on this paper