85
v1v2 (latest)

No Text Needed: Forecasting MT Quality and Inequity from Fertility and Metadata

Main:5 Pages
2 Figures
Bibliography:2 Pages
4 Tables
Abstract

We show that translation quality can be predicted with surprising accuracy \textit{without ever running the translation system itself}. Using only a handful of features, token fertility ratios, token counts, and basic linguistic metadata (language family, script, and region), we can forecast ChrF scores for GPT-4o translations across 203 languages in the FLORES-200 benchmark. Gradient boosting models achieve favorable performance (R2=0.66R^{2}=0.66 for XX\rightarrowEnglish and R2=0.72R^{2}=0.72 for English\rightarrowXX). Feature importance analyses reveal that typological factors dominate predictions into English, while fertility plays a larger role for translations into diverse target languages. These findings suggest that translation quality is shaped by both token-level fertility and broader linguistic typology, offering new insights for multilingual evaluation and quality estimation.

View on arXiv
Comments on this paper