304
v1v2v3 (latest)

Spotter+GPT: Turning Sign Spottings into Sentences with LLMs

International Conference on Intelligent Virtual Agents (IVA), 2024
Main:5 Pages
2 Figures
Bibliography:1 Pages
5 Tables
Abstract

Sign Language Translation (SLT) is a challenging task that aims to generate spoken language sentences from sign language videos. In this paper, we introduce a lightweight, modular SLT framework, Spotter+GPT, that leverages the power of Large Language Models (LLMs) and avoids heavy end-to-end training. Spotter+GPT breaks down the SLT task into two distinct stages. First, a sign spotter identifies individual signs within the input video. The spotted signs are then passed to an LLM, which transforms them into meaningful spoken language sentences. Spotter+GPT eliminates the requirement for SLT-specific training. This significantly reduces computational costs and time requirements. The source code and pretrained weights of the Spotter are available atthis https URL.

View on arXiv
Comments on this paper