ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2407.09230
23
3

Surgical Text-to-Image Generation

12 July 2024
C. Nwoye
Rupak Bose
K. Elgohary
Lorenzo Arboit
Giorgio Carlino
Joël L. Lavanchy
Pietro Mascagni
N. Padoy
    MedIm
ArXivPDFHTML
Abstract

Acquiring surgical data for research and development is significantly hindered by high annotation costs and practical and ethical constraints. Utilizing synthetically generated images could offer a valuable alternative. In this work, we explore adapting text-to-image generative models for the surgical domain using the CholecT50 dataset, which provides surgical images annotated with action triplets (instrument, verb, target). We investigate several language models and find T5 to offer more distinct features for differentiating surgical actions on triplet-based textual inputs, and showcasing stronger alignment between long and triplet-based captions. To address challenges in training text-to-image models solely on triplet-based captions without additional inputs and supervisory signals, we discover that triplet text embeddings are instrument-centric in the latent space. Leveraging this insight, we design an instrument-based class balancing technique to counteract data imbalance and skewness, improving training convergence. Extending Imagen, a diffusion-based generative model, we develop Surgical Imagen to generate photorealistic and activity-aligned surgical images from triplet-based textual prompts. We assess the model on quality, alignment, reasoning, and knowledge, achieving FID and CLIP scores of 3.7 and 26.8% respectively. Human expert survey shows that participants were highly challenged by the realistic characteristics of the generated samples, demonstrating Surgical Imagen's effectiveness as a practical alternative to real data collection.

View on arXiv
@article{nwoye2025_2407.09230,
  title={ Surgical Text-to-Image Generation },
  author={ Chinedu Innocent Nwoye and Rupak Bose and Kareem Elgohary and Lorenzo Arboit and Giorgio Carlino and Joël L. Lavanchy and Pietro Mascagni and Nicolas Padoy },
  journal={arXiv preprint arXiv:2407.09230},
  year={ 2025 }
}
Comments on this paper