135

Advancing Bangla Machine Translation Through Informal Datasets

Ayon Roy
Risat Rahaman
Sadat Shibly
Udoy Saha Joy
Abdulla Al Kafi
Farig Yousuf Sadeque
Main:30 Pages
11 Figures
Bibliography:3 Pages
4 Tables
Abstract

Bangla is the sixth most widely spoken language globally, with approximately 234 million native speakers. However, progress in open-source Bangla machine translation remains limited. Most online resources are in English and often remain untranslated into Bangla, excluding millions from accessing essential information. Existing research in Bangla translation primarily focuses on formal language, neglecting the more commonly used informal language. This is largely due to the lack of pairwise Bangla-English data and advanced translation models. If datasets and models can be enhanced to better handle natural, informal Bangla, millions of people will benefit from improved online information access. In this research, we explore current state-of-the-art models and propose improvements to Bangla translation by developing a dataset from informal sources like social media and conversational texts. This work aims to advance Bangla machine translation by focusing on informal language translation and improving accessibility for Bangla speakers in the digital world.

View on arXiv
Comments on this paper