ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2211.00526
10
7

Leveraging Graph-based Cross-modal Information Fusion for Neural Sign Language Translation

1 November 2022
Jiangbin Zheng
Siyuan Li
Cheng Tan
Chong Wu
Yidong Chen
Stan Z. Li
    SLR
ArXivPDFHTML
Abstract

Sign Language (SL), as the mother tongue of the deaf community, is a special visual language that most hearing people cannot understand. In recent years, neural Sign Language Translation (SLT), as a possible way for bridging communication gap between the deaf and the hearing people, has attracted widespread academic attention. We found that the current mainstream end-to-end neural SLT models, which tries to learning language knowledge in a weakly supervised manner, could not mine enough semantic information under the condition of low data resources. Therefore, we propose to introduce additional word-level semantic knowledge of sign language linguistics to assist in improving current end-to-end neural SLT models. Concretely, we propose a novel neural SLT model with multi-modal feature fusion based on the dynamic graph, in which the cross-modal information, i.e. text and video, is first assembled as a dynamic graph according to their correlation, and then the graph is processed by a multi-modal graph encoder to generate the multi-modal embeddings for further usage in the subsequent neural translation models. To the best of our knowledge, we are the first to introduce graph neural networks, for fusing multi-modal information, into neural sign language translation models. Moreover, we conducted experiments on a publicly available popular SLT dataset RWTH-PHOENIX-Weather-2014T. and the quantitative experiments show that our method can improve the model.

View on arXiv
Comments on this paper