64

Hierarchical Section Matching Prediction (HSMP) BERT for Fine-Grained Extraction of Structured Data from Hebrew Free-Text Radiology Reports in Crohn's Disease

Main:11 Pages
6 Figures
Bibliography:1 Pages
9 Tables
Appendix:9 Pages
Abstract

Extracting structured clinical information from radiology reports is challenging, especially in low-resource languages. This is pronounced in Crohn's disease, with sparsely represented multi-organ findings. We developed Hierarchical Structured Matching Prediction BERT (HSMP-BERT), a prompt-based model for extraction from Hebrew radiology text. In an administrative database study, we analyzed 9,683 reports from Crohn's patients imaged 2010-2023 across Israeli providers. A subset of 512 reports was radiologist-annotated for findings across six gastrointestinal organs and 15 pathologies, yielding 90 structured labels per subject. Multilabel-stratified split (66% train+validation; 33% test), preserving label prevalence. Performance was evaluated with accuracy, F1, Cohen's κ\kappa, AUC, PPV, NPV, and recall. On 24 organ-finding combinations with >>15 positives, HSMP-BERT achieved mean F1 0.83±\pm0.08 and κ\kappa 0.65±\pm0.17, outperforming the SMP zero-shot baseline (F1 0.49±\pm0.07, κ\kappa 0.06±\pm0.07) and standard fine-tuning (F1 0.30±\pm0.27, κ\kappa 0.27±\pm0.34; paired t-test p<107p < 10^{-7}). Hierarchical inference cuts runtime 5.1×\times vs. traditional inference. Applied to all reports, it revealed associations among ileal wall thickening, stenosis, and pre-stenotic dilatation, plus age- and sex-specific trends in inflammatory findings. HSMP-BERT offers a scalable solution for structured extraction in radiology, enabling population-level analysis of Crohn's disease and demonstrating AI's potential in low-resource settings.

View on arXiv
Comments on this paper