Deep diffusion models excel at realistic image synthesis but demand large training sets-an obstacle in data-scarce domains like transesophageal echocardiography (TEE). While synthetic augmentation has boosted performance in transthoracic echo (TTE), TEE remains critically underrepresented, limiting the reach of deep learning in this high-impact modality.We address this gap by adapting a TTE-trained, mask-conditioned diffusion backbone to TEE with only a limited number of new cases and adapters as small as $10^5$ parameters. Our pipeline combines Low-Rank Adaptation with MaskR $^2$ , a lightweight remapping layer that aligns novel mask formats with the pretrained model's conditioning channels. This design lets users adapt models to new datasets with a different set of anatomical structures to the base model's original set.Through a targeted adaptation strategy, we find that adapting only MLP layers suffices for high-fidelity TEE synthesis. Finally, mixing less than 200 real TEE frames with our synthetic echoes improves the dice score on a multiclass segmentation task, particularly boosting performance on underrepresented right-heart structures. Our results demonstrate that (1) semantically controlled TEE images can be generated with low overhead, (2) MaskR $^2$ effectively transforms unseen mask formats into compatible formats without damaging downstream task performance, and (3) our method generates images that are effective for improving performance on a downstream task of multiclass segmentation.

View on arXiv

Comments on this paper