WikiMatrix: Mining 135M Parallel Sentences in 1620 Language Pairs from
Wikipedia

WikiMatrix: Mining 135M Parallel Sentences in 1620 Language Pairs from Wikipedia

10 July 2019

Vishrav Chaudhary

Francisco Guzmán

Papers citing "WikiMatrix: Mining 135M Parallel Sentences in 1620 Language Pairs from Wikipedia"

6 / 56 papers shown

Title
Not Low-Resource Anymore: Aligner Ensembling, Batch Filtering, and New Datasets for Bengali-English Machine Translation Tahmid Hasan Abhik Bhattacharjee Kazi Samin Mubasshir Masum Hasan Madhusudan Basak M. Rahman Rifat Shahriyar VLM 15 72 0 20 Sep 2020
A Multilingual Parallel Corpora Collection Effort for Indian Languages Shashank Siripragrada Jerin Philip Vinay P. Namboodiri C. V. Jawahar VLM 13 47 0 15 Jul 2020
SimAlign: High Quality Word Alignments without Parallel Training Data using Static and Contextualized Embeddings Masoud Jalili Sabet Philipp Dufter François Yvon Hinrich Schütze 4 224 0 18 Apr 2020
Translation Artifacts in Cross-lingual Transfer Learning Mikel Artetxe Gorka Labaka Eneko Agirre 6 114 0 09 Apr 2020
JParaCrawl: A Large Scale Web-Based English-Japanese Parallel Corpus Makoto Morishita Jun Suzuki Masaaki Nagata LRM 30 64 0 25 Nov 2019
MaSS: A Large and Clean Multilingual Corpus of Sentence-aligned Spoken Utterances Extracted from the Bible Marcely Zanon Boito William N. Havard Mahault Garnerin Éric Le Ferrand Laurent Besacier 20 46 0 30 Jul 2019