77
0

VLM-KG: Multimodal Radiology Knowledge Graph Generation

Main:8 Pages
2 Figures
Bibliography:2 Pages
4 Tables
Abstract

Vision-Language Models (VLMs) have demonstrated remarkable success in natural language generation, excelling at instruction following and structured output generation. Knowledge graphs play a crucial role in radiology, serving as valuable sources of factual information and enhancing various downstream tasks. However, generating radiology-specific knowledge graphs presents significant challenges due to the specialized language of radiology reports and the limited availability of domain-specific data. Existing solutions are predominantly unimodal, meaning they generate knowledge graphs only from radiology reports while excluding radiographic images. Additionally, they struggle with long-form radiology data due to limited context length. To address these limitations, we propose a novel multimodal VLM-based framework for knowledge graph generation in radiology. Our approach outperforms previous methods and introduces the first multimodal solution for radiology knowledge graph generation.

View on arXiv
@article{abdullah2025_2505.17042,
  title={ VLM-KG: Multimodal Radiology Knowledge Graph Generation },
  author={ Abdullah Abdullah and Seong Tae Kim },
  journal={arXiv preprint arXiv:2505.17042},
  year={ 2025 }
}
Comments on this paper