411

Consecutive Decoding for Speech-to-text Translation

AAAI Conference on Artificial Intelligence (AAAI), 2020
Abstract

Speech-to-text translation (ST), which directly translates the source language speech to the target language text, has attracted intensive attention recently. However, the combination of speech recognition and machine translation in a single model poses a heavy burden on the direct cross-modal cross-lingual mapping. To reduce the learning difficulty, we propose COnSecutive Transcription and Translation (COSTT), an integral framework for speech-to-text translation. Our method is verified on three mainstream datasets, including Augmented LibriSpeech English-French dataset, TED English-German dataset, and TED English-Chinese dataset. Experiments show that our proposed COSTT outperforms the previous state-of-the-art methods. Our code is available at https://github.com/dqqcasia/st.

View on arXiv
Comments on this paper