Accurate depth estimation enhances endoscopy navigation and diagnostics, but obtaining ground-truth depth in clinical settings is challenging. Synthetic datasets are often used for training, yet the domain gap limits generalization to real data. We propose a novel image-to-image translation framework that preserves structure while generating realistic textures from clinical data. Our key innovation integrates Stable Diffusion with ControlNet, conditioned on a latent representation extracted from a Per-Pixel Shading (PPS) map. PPS captures surface lighting effects, providing a stronger structural constraint than depth maps. Experiments show our approach produces more realistic translations and improves depth estimation over GAN-based MI-CycleGAN. Our code is publicly accessible atthis https URL.
View on arXiv@article{xiong2025_2504.17067, title={ PPS-Ctrl: Controllable Sim-to-Real Translation for Colonoscopy Depth Estimation }, author={ Xinqi Xiong and Andrea Dunn Beltran and Jun Myeong Choi and Marc Niethammer and Roni Sengupta }, journal={arXiv preprint arXiv:2504.17067}, year={ 2025 } }