Conditional diffusion model with spatial attention and latent embedding for medical image segmentation

21 February 2025

Behzad Hejrati

Abstract

Diffusion models have been used extensively for high quality image and video generation tasks. In this paper, we propose a novel conditional diffusion model with spatial attention and latent embedding (cDAL) for medical image segmentation. In cDAL, a convolutional neural network (CNN) based discriminator is used at every time-step of the diffusion process to distinguish between the generated labels and the real ones. A spatial attention map is computed based on the features learned by the discriminator to help cDAL generate more accurate segmentation of discriminative regions in an input image. Additionally, we incorporated a random latent embedding into each layer of our model to significantly reduce the number of training and sampling time-steps, thereby making it much faster than other diffusion models for image segmentation. We applied cDAL on 3 publicly available medical image segmentation datasets (MoNuSeg, Chest X-ray and Hippocampus) and observed significant qualitative and quantitative improvements with higher Dice scores and mIoU over the state-of-the-art algorithms. The source code is publicly available atthis https URL.

View on arXiv

@article{hejrati2025_2502.06997,
  title={ Conditional diffusion model with spatial attention and latent embedding for medical image segmentation },
  author={ Behzad Hejrati and Soumyanil Banerjee and Carri Glide-Hurst and Ming Dong },
  journal={arXiv preprint arXiv:2502.06997},
  year={ 2025 }
}

Comments on this paper