ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2103.15625
11
15

English-Twi Parallel Corpus for Machine Translation

29 March 2021
P. Azunre
Salomey Osei
S. Addo
Lawrence Asamoah Adu-Gyamfi
Stephen E. Moore
Bernard Adabankah
Bernard Opoku
Clara Asare-Nyarko
S. Nyarko
Cynthia Amoaba
Esther Dansoa Appiah
Felix Akwerh
Richard Nii Lante Lawson
Joel Budu
E. Debrah
N. Boateng
Wisdom Ofori
Edwin Buabeng-Munkoh
F. Adjei
Isaac. K. E. Ampomah
Joseph Otoo.
R. Borkor
Standylove Birago Mensah
Lucien Mensah
Mark Amoako Marcel
A. Amponsah
J. B. Hayfron-Acquah
ArXivPDFHTML
Abstract

We present a parallel machine translation training corpus for English and Akuapem Twi of 25,421 sentence pairs. We used a transformer-based translator to generate initial translations in Akuapem Twi, which were later verified and corrected where necessary by native speakers to eliminate any occurrence of translationese. In addition, 697 higher quality crowd-sourced sentences are provided for use as an evaluation set for downstream Natural Language Processing (NLP) tasks. The typical use case for the larger human-verified dataset is for further training of machine translation models in Akuapem Twi. The higher quality 697 crowd-sourced dataset is recommended as a testing dataset for machine translation of English to Twi and Twi to English models. Furthermore, the Twi part of the crowd-sourced data may also be used for other tasks, such as representation learning, classification, etc. We fine-tune the transformer translation model on the training corpus and report benchmarks on the crowd-sourced test set.

View on arXiv
Comments on this paper