Real-time acquisition of accurate depth of scene is essential for automated robotic minimally invasive surgery, and stereo matching with binocular endoscopy can generate such depth. However, existing algorithms struggle with ambiguous tissue boundaries and real-time performance in prevalent high-resolution endoscopic scenes. We propose LightEndoStereo, a lightweight real-time stereo matching method for endoscopic images. We introduce a 3D Mamba Coordinate Attention module to streamline the cost aggregation process by generating position-sensitive attention maps and capturing long-range dependencies across spatial dimensions using the Mamba block. Additionally, we introduce a High-Frequency Disparity Optimization module to refine disparity estimates at tissue boundaries by enhancing high-frequency information in the wavelet domain. Our method is evaluated on the SCARED and SERV-CT datasets, achieving state-of-the-art matching accuracy and a real-time inference speed of 42 FPS. The code is available atthis https URL.
View on arXiv@article{ding2025_2503.00731, title={ LightEndoStereo: A Real-time Lightweight Stereo Matching Method for Endoscopy Images }, author={ Yang Ding and Can Han and Sijia Du and Yaqi Wang and Dahong Qian }, journal={arXiv preprint arXiv:2503.00731}, year={ 2025 } }