455
v1v2v3 (latest)

SignX: Continuous Sign Recognition in Compact Pose-Rich Latent Space

Main:8 Pages
10 Figures
Bibliography:5 Pages
9 Tables
Appendix:10 Pages
Abstract

The complexity of sign language data processing brings many challenges. The current approach to recognition of ASL signs aims to translate RGB sign language videos through pose information into English-based ID Glosses, which serve to uniquely identify ASL signs. This paper proposes SignX, a novel framework for continuous sign language recognition in compact pose-rich latent space. First, we construct a unified latent representation that encodes heterogeneous pose formats (SMPLer-X, DWPose, Mediapipe, PrimeDepth, and Sapiens Segmentation) into a compact, information-dense space. Second, we train a ViT-based Video2Pose module to extract this latent representation directly from raw videos. Finally, we develop a temporal modeling and sequence refinement method that operates entirely in this latent space. This multi-stage design achieves end-to-end sign language recognition while significantly reducing computational consumption. Experimental results demonstrate that SignX achieves state-of-the-art accuracy on continuous sign language recognition.

View on arXiv
Comments on this paper