13
1

CholecTrack20: A Multi-Perspective Tracking Dataset for Surgical Tools

Abstract

Tool tracking in surgical videos is essential for advancing computer-assisted interventions, such as skill assessment, safety zone estimation, and human-machine collaboration. However, the lack of context-rich datasets limits AI applications in this field. Existing datasets rely on overly generic tracking formalizations that fail to capture surgical-specific dynamics, such as tools moving out of the camera's view or exiting the body. This results in less clinically relevant trajectories and a lack of flexibility for real-world surgical applications. Methods trained on these datasets often struggle with visual challenges such as smoke, reflection, and bleeding, further exposing the limitations of current approaches. We introduce CholecTrack20, a specialized dataset for multi-class, multi-tool tracking in surgical procedures. It redefines tracking formalization with three perspectives: (i) intraoperative, (ii) intracorporeal, and (iii) visibility, enabling adaptable and clinically meaningful tool trajectories. The dataset comprises 20 full-length surgical videos, annotated at 1 fps, yielding over 35K frames and 65K labeled tool instances. Annotations include spatial location, category, identity, operator, phase, and scene visual challenge. Benchmarking state-of-the-art methods on CholecTrack20 reveals significant performance gaps, with current approaches (< 45\% HOTA) failing to meet the accuracy required for clinical translation. These findings motivate the need for advanced and intuitive tracking algorithms and establish CholecTrack20 as a foundation for developing robust AI-driven surgical assistance systems.

View on arXiv
@article{nwoye2025_2312.07352,
  title={ CholecTrack20: A Multi-Perspective Tracking Dataset for Surgical Tools },
  author={ Chinedu Innocent Nwoye and Kareem Elgohary and Anvita Srinivas and Fauzan Zaid and Joël L. Lavanchy and Nicolas Padoy },
  journal={arXiv preprint arXiv:2312.07352},
  year={ 2025 }
}
Comments on this paper