ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2510.27680
482
0
v1v2 (latest)

PETAR: Localized Findings Generation with Mask-Aware Vision-Language Modeling for PET Automated Reporting

31 October 2025
Danyal Maqbool
Changhee Lee
Zachary Huemann
Samuel Church
Matthew E. Larson
Scott B. Perlman
Tomas A. Romero
Joshua Warner
Meghan G. Lubner
Xin Tie
J. Merkow
Junjie Hu
Steve Y. Cho
Tyler Bradshaw
    VLM
ArXiv (abs)PDFHTML
Main:8 Pages
5 Figures
Bibliography:4 Pages
16 Tables
Appendix:3 Pages
Abstract

Generating automated reports for 3D positron emission tomography (PET) is an important and challenging task in medical imaging. PET plays a vital role in oncology, but automating report generation is difficult due to the complexity of whole-body 3D volumes, the wide range of potential clinical findings, and the limited availability of annotated datasets. To address these challenges, we introduce PETARSeg-11K, the first large-scale, publicly available dataset that provides lesion-level correspondence between 3D PET/CT volumes and free-text radiological findings. It comprises 11,356 lesion descriptions paired with 3D segmentations. Second, we propose PETAR-4B, a 3D vision-language model designed for mask-aware, spatially grounded PET/CT reporting. PETAR-4B jointly encodes PET, CT, and 3D lesion segmentation masks, using a 3D focal prompt to capture fine-grained details of lesions that normally comprise less than 0.1% of the volume. Evaluations using automated metrics show PETAR-4B substantially outperforming all 2D and 3D baselines. A human study involving five physicians -- the first of its kind for automated PET reporting -- confirms the model's clinical utility and establishes correlations between automated metrics and expert judgment. This work provides a foundational dataset and a novel architecture, advancing 3D medical vision-language understanding in PET.

View on arXiv
Comments on this paper