v1v2v3 (latest)

Comparative validation of surgical phase recognition, instrument keypoint estimation, and instrument instance segmentation in endoscopy: Results of the PhaKIR 2024 challenge

22 July 2025

Tobias Rueckert

David Rauber

Raphaela Maerkl

Leonard Klausmann

Suemeyye R. Yildiran

Max Gutbrod

Danilo Weber Nunes

Alvaro Fernandez Moreno

Imanol Luengo

Danail Stoyanov

Nicolas Toussaint

Enki Cho

Hyeon Bae Kim

Oh Sung Choo

Ka Young Kim

Seong Tae Kim

Gonçalo Arantes

Kehan Song

Jianjun Zhu

Junchen Xiong

Tingyi Lin

Shunsuke Kikuchi

Hiroki Matsuzaki

Atsushi Kouno

João Renato Ribeiro Manesco

João Paulo Papa

Tae-Min Choi

Tae Kyeong Jeong

Juyoun Park

Oluwatosin Alabi

Meng Wei

Tom Vercauteren

Runzhi Wu

Mengya Xu

An Wang

Long Bai

Hongliang Ren

Amine Yamlahi

Jakob Hennighausen

Lena Maier-Hein

Satoshi Kondo

Satoshi Kasai

Kousuke Hirasawa

Shu Yang

Yihui Wang

Hao Chen

Santiago Rodríguez

Nicolás Aparicio

Leonardo Manrique

Juan Camilo Lyons

Olivia Hosie

Nicolás Ayobi

Pablo Arbeláez

Yiping Li

Yasmina Al Khalil

Sahar Nasirihaghighi

Stefanie Speidel

Daniel Rueckert

Hubertus Feussner

Dirk Wilhelm

Christoph Palm

ArXiv (abs)PDF HTML Github

Main:32 Pages

16 Figures

Bibliography:5 Pages

17 Tables

Abstract

Reliable recognition and localization of surgical instruments in endoscopic video recordings are foundational for a wide range of applications in computer- and robot-assisted minimally invasive surgery (RAMIS), including surgical training, skill assessment, and autonomous assistance. However, robust performance under real-world conditions remains a significant challenge. Incorporating surgical context - such as the current procedural phase - has emerged as a promising strategy to improve robustness and interpretability.To address these challenges, we organized the Surgical Procedure Phase, Keypoint, and Instrument Recognition (PhaKIR) sub-challenge as part of the Endoscopic Vision (EndoVis) challenge at MICCAI 2024. We introduced a novel, multi-center dataset comprising thirteen full-length laparoscopic cholecystectomy videos collected from three distinct medical institutions, with unified annotations for three interrelated tasks: surgical phase recognition, instrument keypoint estimation, and instrument instance segmentation. Unlike existing datasets, ours enables joint investigation of instrument localization and procedural context within the same data while supporting the integration of temporal information across entire procedures.We report results and findings in accordance with the BIAS guidelines for biomedical image analysis challenges. The PhaKIR sub-challenge advances the field by providing a unique benchmark for developing temporally aware, context-driven methods in RAMIS and offers a high-quality resource to support future research in surgical scene understanding.

View on arXiv

Comments on this paper