VLM-KG: Multimodal Radiology Knowledge Graph Generation

Vision-Language Models (VLMs) have demonstrated remarkable success in natural language generation, excelling at instruction following and structured output generation. Knowledge graphs play a crucial role in radiology, serving as valuable sources of factual information and enhancing various downstream tasks. However, generating radiology-specific knowledge graphs presents significant challenges due to the specialized language of radiology reports and the limited availability of domain-specific data. Existing solutions are predominantly unimodal, meaning they generate knowledge graphs only from radiology reports while excluding radiographic images. Additionally, they struggle with long-form radiology data due to limited context length. To address these limitations, we propose a novel multimodal VLM-based framework for knowledge graph generation in radiology. Our approach outperforms previous methods and introduces the first multimodal solution for radiology knowledge graph generation.
View on arXiv@article{abdullah2025_2505.17042, title={ VLM-KG: Multimodal Radiology Knowledge Graph Generation }, author={ Abdullah Abdullah and Seong Tae Kim }, journal={arXiv preprint arXiv:2505.17042}, year={ 2025 } }