22
0

Together We Can: Multilingual Automatic Post-Editing for Low-Resource Languages

Abstract

This exploratory study investigates the potential of multilingual Automatic Post-Editing (APE) systems to enhance the quality of machine translations for low-resource Indo-Aryan languages. Focusing on two closely related language pairs, English-Marathi and English-Hindi, we exploit the linguistic similarities to develop a robust multilingual APE model. To facilitate cross-linguistic transfer, we generate synthetic Hindi-Marathi and Marathi-Hindi APE triplets. Additionally, we incorporate a Quality Estimation (QE)-APE multi-task learning framework. While the experimental results underline the complementary nature of APE and QE, we also observe that QE-APE multitask learning facilitates effective domain adaptation. Our experiments demonstrate that the multilingual APE models outperform their corresponding English-Hindi and English-Marathi single-pair models by 2.52.5 and 2.392.39 TER points, respectively, with further notable improvements over the multilingual APE model observed through multi-task learning (+1.29+1.29 and +1.44+1.44 TER points), data augmentation (+0.53+0.53 and +0.45+0.45 TER points) and domain adaptation (+0.35+0.35 and +0.45+0.45 TER points). We release the synthetic data, code, and models accrued during this study publicly at https://github.com/cfiltnlp/Multilingual-APE.

View on arXiv
Comments on this paper