This work tackles the challenge of continuous sign language segmentation, a key task with huge implications for sign language translation and data annotation. We propose a transformer-based architecture that models the temporal dynamics of signing and frames segmentation as a sequence labeling problem using the Begin-In-Out (BIO) tagging scheme. Our method leverages the HaMeR hand features, and is complemented with 3D Angles. Extensive experiments show that our model achieves state-of-the-art results on the DGS Corpus, while our features surpass prior benchmarks on BSLCorpus.
View on arXiv@article{he2025_2504.08593, title={ Hands-On: Segmenting Individual Signs from Continuous Sequences }, author={ Low Jian He and Harry Walsh and Ozge Mercanoglu Sincan and Richard Bowden }, journal={arXiv preprint arXiv:2504.08593}, year={ 2025 } }