A Shocking Amount of the Web is Machine Translated: Insights from
Multi-Way Parallelism

A Shocking Amount of the Web is Machine Translated: Insights from Multi-Way Parallelism

11 January 2024

Mehak Preet Dhaliwal

Marcello Federico

Papers citing "A Shocking Amount of the Web is Machine Translated: Insights from Multi-Way Parallelism"

7 / 7 papers shown

Title
Adapters for Altering LLM Vocabularies: What Languages Benefit the Most? HyoJung Han Akiko Eriguchi Haoran Xu Hieu T. Hoang Marine Carpuat Huda Khayrallah VLM 32 2 0 12 Oct 2024
Leveraging Corpus Metadata to Detect Template-based Translation: An Exploratory Case Study of the Egyptian Arabic Wikipedia Edition Saied Alshahrani Hesham Haroon Ali Elfilali Mariama Njie Jeanna Neefe Matthews 21 0 0 31 Mar 2024
The Falcon Series of Open Language Models Ebtesam Almazrouei Hamza Alobeidli Abdulaziz Alshamsi Alessandro Cappelli Ruxandra-Aimée Cojocaru ... Quentin Malartic Daniele Mazzotta Badreddine Noune B. Pannier Guilherme Penedo AI4TS ALM 113 389 0 28 Nov 2023
What Language Model to Train if You Have One Million GPU Hours? Teven Le Scao Thomas Wang Daniel Hesslow Lucile Saulnier Stas Bekman ... Lintang Sutawika Jaesung Tae Zheng-Xin Yong Julien Launay Iz Beltagy MoE AI4CE 225 103 0 27 Oct 2022
Embarrassingly Easy Document-Level MT Metrics: How to Convert Any Pretrained Metric Into a Document-Level Metric Giorgos Vernikos Brian Thompson Prashant Mathur Marcello Federico 36 40 0 27 Sep 2022
CometKiwi: IST-Unbabel 2022 Submission for the Quality Estimation Shared Task Ricardo Rei Marcos Vinícius Treviso Nuno M. Guerreiro Chrysoula Zerva Ana C. Farinha ... T. Glushkova Duarte M. Alves A. Lavie Luísa Coheur André F. T. Martins 52 137 0 13 Sep 2022
The Pile: An 800GB Dataset of Diverse Text for Language Modeling Leo Gao Stella Biderman Sid Black Laurence Golding Travis Hoppe ... Horace He Anish Thite Noa Nabeshima Shawn Presser Connor Leahy AIMat 245 1,977 0 31 Dec 2020