Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1911.06154
Cited By
CCAligned: A Massive Collection of Cross-Lingual Web-Document Pairs
10 November 2019
Ahmed El-Kishky
Vishrav Chaudhary
Francisco Guzman
Philipp Koehn
Re-assign community
ArXiv
PDF
HTML
Papers citing
"CCAligned: A Massive Collection of Cross-Lingual Web-Document Pairs"
18 / 18 papers shown
Title
A kinetic-based regularization method for data science applications
Abhisek Ganguly
Alessandro Gabbana
Vybhav Rao
Sauro Succi
Santosh Ansumali
41
0
0
06 Mar 2025
Multilingual Machine Translation with Open Large Language Models at Practical Scale: An Empirical Study
Menglong Cui
Pengzhi Gao
Wei Liu
Jian Luan
Bin Wang
LRM
41
0
0
04 Feb 2025
How to Learn in a Noisy World? Self-Correcting the Real-World Data Noise in Machine Translation
Yan Meng
Di Wu
Christof Monz
28
1
0
02 Jul 2024
GATE X-E : A Challenge Set for Gender-Fair Translations from Weakly-Gendered Languages
Spencer Rarrick
Ranjita Naik
Sundar Poudel
Vishal Chowdhary
27
1
0
22 Feb 2024
Stolen Subwords: Importance of Vocabularies for Machine Translation Model Stealing
Vilém Zouhar
AAML
30
0
0
29 Jan 2024
Beyond Triplet: Leveraging the Most Data for Multimodal Machine Translation
Yaoming Zhu
Zewei Sun
Shanbo Cheng
Yuyang Huang
Liwei Wu
Mingxuan Wang
21
10
0
20 Dec 2022
GanLM: Encoder-Decoder Pre-training with an Auxiliary Discriminator
Jian Yang
Shuming Ma
Li Dong
Shaohan Huang
Haoyang Huang
Yuwei Yin
Dongdong Zhang
Liqun Yang
Furu Wei
Zhoujun Li
SyDa
AI4CE
27
25
0
20 Dec 2022
Advancing Multilingual Pre-training: TRIP Triangular Document-level Pre-training for Multilingual Language Models
Hongyuan Lu
Haoyang Huang
Shuming Ma
Dongdong Zhang
W. Lam
Furu Wei
19
4
0
15 Dec 2022
Separating Grains from the Chaff: Using Data Filtering to Improve Multilingual Translation for Low-Resourced African Languages
Idris Abdulmumin
Michael Beukman
Jesujoba Oluwadara Alabi
Chris C. Emezue
Everlyn Asiko
...
Shamsuddeen Hassan Muhammad
Mofetoluwa Adeyemi
Oreen Yousuf
Sahib Singh
T. Gwadabe
21
6
0
19 Oct 2022
esCorpius: A Massive Spanish Crawling Corpus
Asier Gutiérrez-Fandiño
David Pérez-Fernández
Jordi Armengol-Estapé
D. Griol
Z. Callejas
26
2
0
30 Jun 2022
DeepNet: Scaling Transformers to 1,000 Layers
Hongyu Wang
Shuming Ma
Li Dong
Shaohan Huang
Dongdong Zhang
Furu Wei
MoE
AI4CE
15
155
0
01 Mar 2022
Towards a Cleaner Document-Oriented Multilingual Crawled Corpus
Julien Abadji
Pedro Ortiz Suarez
Laurent Romary
Benoît Sagot
CLL
20
153
0
17 Jan 2022
Improving Large-scale Language Models and Resources for Filipino
Jan Christian Blaise Cruz
C. Cheng
AI4CE
24
27
0
11 Nov 2021
Improving Arabic Diacritization by Learning to Diacritize and Translate
Brian Thompson
A. Alshehri
27
10
0
29 Sep 2021
Multilingual Document-Level Translation Enables Zero-Shot Transfer From Sentences to Documents
Biao Zhang
Ankur Bapna
Melvin Johnson
A. Dabirmoghaddam
N. Arivazhagan
Orhan Firat
26
12
0
21 Sep 2021
Facebook AI WMT21 News Translation Task Submission
C. Tran
Shruti Bhosale
James Cross
Philipp Koehn
Sergey Edunov
Angela Fan
VLM
134
80
0
06 Aug 2021
The USYD-JD Speech Translation System for IWSLT 2021
Liang Ding
Di Wu
Dacheng Tao
19
16
0
24 Jul 2021
A Survey on Low-Resource Neural Machine Translation
Rui Wang
Xu Tan
Renqian Luo
Tao Qin
Tie-Yan Liu
3DV
33
58
0
09 Jul 2021
1