ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1907.05791
  4. Cited By
WikiMatrix: Mining 135M Parallel Sentences in 1620 Language Pairs from
  Wikipedia

WikiMatrix: Mining 135M Parallel Sentences in 1620 Language Pairs from Wikipedia

10 July 2019
Holger Schwenk
Vishrav Chaudhary
Shuo Sun
Hongyu Gong
Francisco Guzmán
    CVBM
ArXivPDFHTML

Papers citing "WikiMatrix: Mining 135M Parallel Sentences in 1620 Language Pairs from Wikipedia"

50 / 54 papers shown
Title
A kinetic-based regularization method for data science applications
Abhisek Ganguly
Alessandro Gabbana
Vybhav Rao
Sauro Succi
Santosh Ansumali
41
0
0
06 Mar 2025
Adapters for Altering LLM Vocabularies: What Languages Benefit the Most?
Adapters for Altering LLM Vocabularies: What Languages Benefit the Most?
HyoJung Han
Akiko Eriguchi
Haoran Xu
Hieu T. Hoang
Marine Carpuat
Huda Khayrallah
VLM
32
2
0
12 Oct 2024
EMMA-500: Enhancing Massively Multilingual Adaptation of Large Language Models
EMMA-500: Enhancing Massively Multilingual Adaptation of Large Language Models
Shaoxiong Ji
Zihao Li
Indraneil Paul
Jaakko Paavola
Peiqin Lin
...
Dayyán O'Brien
Hengyu Luo
Hinrich Schütze
Jörg Tiedemann
Barry Haddow
CLL
35
3
0
26 Sep 2024
Latent Space Translation via Inverse Relative Projection
Latent Space Translation via Inverse Relative Projection
Valentino Maiorca
Luca Moschella
Marco Fumero
Francesco Locatello
Emanuele Rodolà
29
1
0
21 Jun 2024
Critical Learning Periods: Leveraging Early Training Dynamics for
  Efficient Data Pruning
Critical Learning Periods: Leveraging Early Training Dynamics for Efficient Data Pruning
E. Chimoto
Jay Gala
Orevaoghene Ahia
Julia Kreutzer
Bruce A. Bassett
Sara Hooker
VLM
34
4
0
29 May 2024
Relay Decoding: Concatenating Large Language Models for Machine
  Translation
Relay Decoding: Concatenating Large Language Models for Machine Translation
Chengpeng Fu
Xiaocheng Feng
Yi-Chong Huang
Wenshuai Huo
Baohang Li
Hui Wang
Bing Qin
Ting Liu
24
0
0
05 May 2024
GATE X-E : A Challenge Set for Gender-Fair Translations from
  Weakly-Gendered Languages
GATE X-E : A Challenge Set for Gender-Fair Translations from Weakly-Gendered Languages
Spencer Rarrick
Ranjita Naik
Sundar Poudel
Vishal Chowdhary
27
1
0
22 Feb 2024
The Ups and Downs of Large Language Model Inference with Vocabulary
  Trimming by Language Heuristics
The Ups and Downs of Large Language Model Inference with Vocabulary Trimming by Language Heuristics
Nikolay Bogoychev
Pinzhen Chen
Barry Haddow
Alexandra Birch
25
0
0
16 Nov 2023
How Vocabulary Sharing Facilitates Multilingualism in LLaMA?
How Vocabulary Sharing Facilitates Multilingualism in LLaMA?
Fei Yuan
Shuai Yuan
Zhiyong Wu
Lei Li
20
10
0
15 Nov 2023
Leveraging LLMs for Synthesizing Training Data Across Many Languages in
  Multilingual Dense Retrieval
Leveraging LLMs for Synthesizing Training Data Across Many Languages in Multilingual Dense Retrieval
Nandan Thakur
Jianmo Ni
Gustavo Hernández Ábrego
John Wieting
Jimmy J. Lin
Daniel Matthew Cer
RALM
23
12
0
10 Nov 2023
Glot500: Scaling Multilingual Corpora and Language Models to 500
  Languages
Glot500: Scaling Multilingual Corpora and Language Models to 500 Languages
Ayyoob Imani
Peiqin Lin
Amir Hossein Kargaran
Silvia Severini
Masoud Jalili Sabet
...
Chunlan Ma
Helmut Schmid
André F. T. Martins
François Yvon
Hinrich Schütze
ALM
LRM
29
95
0
20 May 2023
ChatGPT Perpetuates Gender Bias in Machine Translation and Ignores
  Non-Gendered Pronouns: Findings across Bengali and Five other Low-Resource
  Languages
ChatGPT Perpetuates Gender Bias in Machine Translation and Ignores Non-Gendered Pronouns: Findings across Bengali and Five other Low-Resource Languages
Sourojit Ghosh
Aylin Caliskan
20
69
0
17 May 2023
RC3: Regularized Contrastive Cross-lingual Cross-modal Pre-training
RC3: Regularized Contrastive Cross-lingual Cross-modal Pre-training
Chulun Zhou
Yunlong Liang
Fandong Meng
Jinan Xu
Jinsong Su
Jie Zhou
VLM
16
4
0
13 May 2023
Escaping the sentence-level paradigm in machine translation
Escaping the sentence-level paradigm in machine translation
Matt Post
Marcin Junczys-Dowmunt
24
26
0
25 Apr 2023
A Survey of Corpora for Germanic Low-Resource Languages and Dialects
A Survey of Corpora for Germanic Low-Resource Languages and Dialects
Verena Blaschke
Hinrich Schütze
Barbara Plank
19
13
0
19 Apr 2023
Bilex Rx: Lexical Data Augmentation for Massively Multilingual Machine
  Translation
Bilex Rx: Lexical Data Augmentation for Massively Multilingual Machine Translation
Alex Jones
Isaac Caswell
Ishan Saxena
Orhan Firat
16
8
0
27 Mar 2023
ERNIE-Code: Beyond English-Centric Cross-lingual Pretraining for
  Programming Languages
ERNIE-Code: Beyond English-Centric Cross-lingual Pretraining for Programming Languages
Yekun Chai
Shuohuan Wang
Chao Pang
Yu Sun
Hao Tian
Hua-Hong Wu
14
35
0
13 Dec 2022
Towards a general purpose machine translation system for Sranantongo
Towards a general purpose machine translation system for Sranantongo
Just Zwennicker
David Stap
11
4
0
13 Dec 2022
CUNI Systems for the WMT22 Czech-Ukrainian Translation Task
CUNI Systems for the WMT22 Czech-Ukrainian Translation Task
Martin Popel
Jindrich Libovický
Jindřich Helcl
14
4
0
01 Dec 2022
Beyond Counting Datasets: A Survey of Multilingual Dataset Construction
  and Necessary Resources
Beyond Counting Datasets: A Survey of Multilingual Dataset Construction and Necessary Resources
Xinyan Velocity Yu
Akari Asai
Trina Chatterjee
Junjie Hu
Eunsol Choi
16
21
0
28 Nov 2022
TSMind: Alibaba and Soochow University's Submission to the WMT22
  Translation Suggestion Task
TSMind: Alibaba and Soochow University's Submission to the WMT22 Translation Suggestion Task
Xin Ge
Ke Min Wang
Jiayi Wang
Nini Xiao
Xiangyu Duan
Yu Zhao
Yuqi Zhang
14
2
0
16 Nov 2022
ERNIE-UniX2: A Unified Cross-lingual Cross-modal Framework for
  Understanding and Generation
ERNIE-UniX2: A Unified Cross-lingual Cross-modal Framework for Understanding and Generation
Bin Shan
Yaqian Han
Weichong Yin
Shuohuan Wang
Yu Sun
Hao Tian
Hua-Hong Wu
Haifeng Wang
MLLM
VLM
11
7
0
09 Nov 2022
Learning an Artificial Language for Knowledge-Sharing in Multilingual
  Translation
Learning an Artificial Language for Knowledge-Sharing in Multilingual Translation
Danni Liu
J. Niehues
12
5
0
02 Nov 2022
Very Low Resource Sentence Alignment: Luhya and Swahili
Very Low Resource Sentence Alignment: Luhya and Swahili
E. Chimoto
Bruce A. Bassett
CVBM
11
10
0
31 Oct 2022
Improving Zero-Shot Multilingual Translation with Universal
  Representations and Cross-Mappings
Improving Zero-Shot Multilingual Translation with Universal Representations and Cross-Mappings
Shuhao Gu
Yang Feng
12
11
0
28 Oct 2022
Leveraging Affirmative Interpretations from Negation Improves Natural
  Language Understanding
Leveraging Affirmative Interpretations from Negation Improves Natural Language Understanding
Md Mosharaf Hossain
Eduardo Blanco
22
4
0
26 Oct 2022
RuCoLA: Russian Corpus of Linguistic Acceptability
RuCoLA: Russian Corpus of Linguistic Acceptability
Vladislav Mikhailov
T. Shamardina
Max Ryabinin
A. Pestova
I. Smurov
Ekaterina Artemova
17
28
0
23 Oct 2022
The University of Edinburgh's Submission to the WMT22 Code-Mixing Shared
  Task (MixMT)
The University of Edinburgh's Submission to the WMT22 Code-Mixing Shared Task (MixMT)
Faheem Kirefu
Vivek Iyer
Pinzhen Chen
Laurie Burchell
MoE
21
1
0
20 Oct 2022
Separating Grains from the Chaff: Using Data Filtering to Improve
  Multilingual Translation for Low-Resourced African Languages
Separating Grains from the Chaff: Using Data Filtering to Improve Multilingual Translation for Low-Resourced African Languages
Idris Abdulmumin
Michael Beukman
Jesujoba Oluwadara Alabi
Chris C. Emezue
Everlyn Asiko
...
Shamsuddeen Hassan Muhammad
Mofetoluwa Adeyemi
Oreen Yousuf
Sahib Singh
T. Gwadabe
21
6
0
19 Oct 2022
Exploring Diversity in Back Translation for Low-Resource Machine
  Translation
Exploring Diversity in Back Translation for Low-Resource Machine Translation
Laurie Burchell
Alexandra Birch
Kenneth Heafield
21
15
0
01 Jun 2022
Understanding and Mitigating the Uncertainty in Zero-Shot Translation
Understanding and Mitigating the Uncertainty in Zero-Shot Translation
Wenxuan Wang
Wenxiang Jiao
Shuo Wang
Zhaopeng Tu
Michael R. Lyu
UQLM
27
9
0
20 May 2022
SAMU-XLSR: Semantically-Aligned Multimodal Utterance-level Cross-Lingual
  Speech Representation
SAMU-XLSR: Semantically-Aligned Multimodal Utterance-level Cross-Lingual Speech Representation
Sameer Khurana
Antoine Laurent
James R. Glass
22
36
0
17 May 2022
Non-Autoregressive Machine Translation: It's Not as Fast as it Seems
Non-Autoregressive Machine Translation: It's Not as Fast as it Seems
Jindvrich Helcl
Barry Haddow
Alexandra Birch
14
19
0
04 May 2022
One Country, 700+ Languages: NLP Challenges for Underrepresented
  Languages and Dialects in Indonesia
One Country, 700+ Languages: NLP Challenges for Underrepresented Languages and Dialects in Indonesia
Alham Fikri Aji
Genta Indra Winata
Fajri Koto
Samuel Cahyawijaya
Ade Romadhony
...
David Moeljadi
Radityo Eko Prasojo
Timothy Baldwin
Jey Han Lau
Sebastian Ruder
38
98
0
24 Mar 2022
Dataset Geography: Mapping Language Data to Language Users
Dataset Geography: Mapping Language Data to Language Users
Fahim Faisal
Yinkai Wang
Antonios Anastasopoulos
54
23
0
07 Dec 2021
Improving Large-scale Language Models and Resources for Filipino
Improving Large-scale Language Models and Resources for Filipino
Jan Christian Blaise Cruz
C. Cheng
AI4CE
24
27
0
11 Nov 2021
FacTeR-Check: Semi-automated fact-checking through Semantic Similarity
  and Natural Language Inference
FacTeR-Check: Semi-automated fact-checking through Semantic Similarity and Natural Language Inference
Alejandro Martín
Javier Huertas-Tato
Álvaro Huertas-García
Guillermo Villar-Rodríguez
David Camacho
HILM
6
31
0
27 Oct 2021
PhoMT: A High-Quality and Large-Scale Benchmark Dataset for
  Vietnamese-English Machine Translation
PhoMT: A High-Quality and Large-Scale Benchmark Dataset for Vietnamese-English Machine Translation
Long Doan
L. T. Nguyen
Nguyen Luong Tran
T. Hoang
Dat Quoc Nguyen
23
22
0
23 Oct 2021
Improved Multilingual Language Model Pretraining for Social Media Text
  via Translation Pair Prediction
Improved Multilingual Language Model Pretraining for Social Media Text via Translation Pair Prediction
Shubhanshu Mishra
A. Haghighi
VLM
11
4
0
20 Oct 2021
Improving Arabic Diacritization by Learning to Diacritize and Translate
Improving Arabic Diacritization by Learning to Diacritize and Translate
Brian Thompson
A. Alshehri
27
10
0
29 Sep 2021
Fine Grained Human Evaluation for English-to-Chinese Machine
  Translation: A Case Study on Scientific Text
Fine Grained Human Evaluation for English-to-Chinese Machine Translation: A Case Study on Scientific Text
Ming Liu
Heng Zhang
Guanhao Wu
26
1
0
13 Sep 2021
The Grammar-Learning Trajectories of Neural Language Models
The Grammar-Learning Trajectories of Neural Language Models
Leshem Choshen
Guy Hacohen
D. Weinshall
Omri Abend
12
28
0
13 Sep 2021
Survey of Low-Resource Machine Translation
Survey of Low-Resource Machine Translation
Barry Haddow
Rachel Bawden
Antonio Valerio Miceli Barone
Jindvrich Helcl
Alexandra Birch
AIMat
27
147
0
01 Sep 2021
The USYD-JD Speech Translation System for IWSLT 2021
The USYD-JD Speech Translation System for IWSLT 2021
Liang Ding
Di Wu
Dacheng Tao
19
16
0
24 Jul 2021
A Survey on Low-Resource Neural Machine Translation
A Survey on Low-Resource Neural Machine Translation
Rui Wang
Xu Tan
Renqian Luo
Tao Qin
Tie-Yan Liu
3DV
33
58
0
09 Jul 2021
Neural Machine Translation for Low-Resource Languages: A Survey
Neural Machine Translation for Low-Resource Languages: A Survey
Surangika Ranathunga
E. Lee
Marjana Prifti Skenduli
Ravi Shekhar
Mehreen Alam
Rishemjit Kaur
25
233
0
29 Jun 2021
Improving Pretrained Cross-Lingual Language Models via Self-Labeled Word
  Alignment
Improving Pretrained Cross-Lingual Language Models via Self-Labeled Word Alignment
Zewen Chi
Li Dong
Bo Zheng
Shaohan Huang
Xian-Ling Mao
Heyan Huang
Furu Wei
37
67
0
11 Jun 2021
The FLORES-101 Evaluation Benchmark for Low-Resource and Multilingual
  Machine Translation
The FLORES-101 Evaluation Benchmark for Low-Resource and Multilingual Machine Translation
Naman Goyal
Cynthia Gao
Vishrav Chaudhary
Peng-Jen Chen
Guillaume Wenzek
Da Ju
Sanjan Krishnan
MarcÁurelio Ranzato
Francisco Guzman
Angela Fan
8
550
0
06 Jun 2021
Not Low-Resource Anymore: Aligner Ensembling, Batch Filtering, and New
  Datasets for Bengali-English Machine Translation
Not Low-Resource Anymore: Aligner Ensembling, Batch Filtering, and New Datasets for Bengali-English Machine Translation
Tahmid Hasan
Abhik Bhattacharjee
Kazi Samin Mubasshir
Masum Hasan
Madhusudan Basak
M. Rahman
Rifat Shahriyar
VLM
15
72
0
20 Sep 2020
A Multilingual Parallel Corpora Collection Effort for Indian Languages
A Multilingual Parallel Corpora Collection Effort for Indian Languages
Shashank Siripragrada
Jerin Philip
Vinay P. Namboodiri
C. V. Jawahar
VLM
11
47
0
15 Jul 2020
12
Next