VLM-KG: Multimodal Radiology Knowledge Graph Generation

13 May 2025

Main:8 Pages

2 Figures

Bibliography:2 Pages

4 Tables

Abstract

Vision-Language Models (VLMs) have demonstrated remarkable success in natural language generation, excelling at instruction following and structured output generation. Knowledge graphs play a crucial role in radiology, serving as valuable sources of factual information and enhancing various downstream tasks. However, generating radiology-specific knowledge graphs presents significant challenges due to the specialized language of radiology reports and the limited availability of domain-specific data. Existing solutions are predominantly unimodal, meaning they generate knowledge graphs only from radiology reports while excluding radiographic images. Additionally, they struggle with long-form radiology data due to limited context length. To address these limitations, we propose a novel multimodal VLM-based framework for knowledge graph generation in radiology. Our approach outperforms previous methods and introduces the first multimodal solution for radiology knowledge graph generation.

View on arXiv

@article{abdullah2025_2505.17042,
  title={ VLM-KG: Multimodal Radiology Knowledge Graph Generation },
  author={ Abdullah Abdullah and Seong Tae Kim },
  journal={arXiv preprint arXiv:2505.17042},
  year={ 2025 }
}

Comments on this paper