ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2502.04314
93
0

BOUQuET: dataset, Benchmark and Open initiative for Universal Quality Evaluation in Translation

6 February 2025
Omnilingual MT Team
Pierre Yves Andrews
Mikel Artetxe
Mariano Coria Meglioli
Marta R. Costa-jussá
Joe Chuang
David Dale
Cynthia Gao
Jean Maillard
Alex Mourachko
C. Ropers
Safiyyah Saleem
Eduardo Sánchez
Ioannis Tsiamas
Arina Turkatenko
Albert Ventayol-Boada
Shireen Yates
ArXivPDFHTML
Abstract

This paper presents BOUQuET, a multicentric and multi-register/domain dataset and benchmark, and its broader collaborative extension initiative. This dataset is handcrafted in non-English languages first, each of these source languages being represented among the 23 languages commonly used by half of the world's population and therefore having the potential to serve as pivot languages that will enable more accurate translations. The dataset is specially designed to avoid contamination and be multicentric, so as to enforce representation of multilingual language features. In addition, the dataset goes beyond the sentence level, as it is organized in paragraphs of various lengths. Compared with related machine translation (MT) datasets, we show that BOUQuET has a broader representation of domains while simplifying the translation task for non-experts. Therefore, BOUQuET is specially suitable for the open initiative and call for translation participation that we are launching to extend it to a multi-way parallel corpus to any written language.

View on arXiv
@article{team2025_2502.04314,
  title={ BOUQuET: dataset, Benchmark and Open initiative for Universal Quality Evaluation in Translation },
  author={ Omnilingual MT Team and Pierre Andrews and Mikel Artetxe and Mariano Coria Meglioli and Marta R. Costa-jussà and Joe Chuang and David Dale and Cynthia Gao and Jean Maillard and Alex Mourachko and Christophe Ropers and Safiyyah Saleem and Eduardo Sánchez and Ioannis Tsiamas and Arina Turkatenko and Albert Ventayol-Boada and Shireen Yates },
  journal={arXiv preprint arXiv:2502.04314},
  year={ 2025 }
}
Comments on this paper