ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1911.04944
  4. Cited By
CCMatrix: Mining Billions of High-Quality Parallel Sentences on the WEB

CCMatrix: Mining Billions of High-Quality Parallel Sentences on the WEB

10 November 2019
Holger Schwenk
Guillaume Wenzek
Sergey Edunov
Edouard Grave
Armand Joulin
ArXivPDFHTML

Papers citing "CCMatrix: Mining Billions of High-Quality Parallel Sentences on the WEB"

26 / 26 papers shown
Title
Multilingual Machine Translation with Open Large Language Models at Practical Scale: An Empirical Study
Multilingual Machine Translation with Open Large Language Models at Practical Scale: An Empirical Study
Menglong Cui
Pengzhi Gao
Wei Liu
Jian Luan
Bin Wang
LRM
41
0
0
04 Feb 2025
Adapters for Altering LLM Vocabularies: What Languages Benefit the Most?
Adapters for Altering LLM Vocabularies: What Languages Benefit the Most?
HyoJung Han
Akiko Eriguchi
Haoran Xu
Hieu T. Hoang
Marine Carpuat
Huda Khayrallah
VLM
32
2
0
12 Oct 2024
Cogs in a Machine, Doing What They're Meant to Do -- The AMI Submission
  to the WMT24 General Translation Task
Cogs in a Machine, Doing What They're Meant to Do -- The AMI Submission to the WMT24 General Translation Task
Atli Jasonarson
Hinrik Hafsteinsson
Bjarki Ármannsson
Steinþór Steingrímsson
SyDa
27
2
0
04 Oct 2024
EMMA-500: Enhancing Massively Multilingual Adaptation of Large Language Models
EMMA-500: Enhancing Massively Multilingual Adaptation of Large Language Models
Shaoxiong Ji
Zihao Li
Indraneil Paul
Jaakko Paavola
Peiqin Lin
...
Dayyán O'Brien
Hengyu Luo
Hinrich Schütze
Jörg Tiedemann
Barry Haddow
CLL
35
3
0
26 Sep 2024
Modular Sentence Encoders: Separating Language Specialization from
  Cross-Lingual Alignment
Modular Sentence Encoders: Separating Language Specialization from Cross-Lingual Alignment
Yongxin Huang
Kexin Wang
Goran Glavavs
Iryna Gurevych
44
0
0
20 Jul 2024
Crosslingual Capabilities and Knowledge Barriers in Multilingual Large Language Models
Crosslingual Capabilities and Knowledge Barriers in Multilingual Large Language Models
Lynn Chua
Badih Ghazi
Yangsibo Huang
Pritish Kamath
Ravi Kumar
Pasin Manurangsi
Amer Sinha
Chulin Xie
Chiyuan Zhang
51
1
0
23 Jun 2024
Critical Learning Periods: Leveraging Early Training Dynamics for
  Efficient Data Pruning
Critical Learning Periods: Leveraging Early Training Dynamics for Efficient Data Pruning
E. Chimoto
Jay Gala
Orevaoghene Ahia
Julia Kreutzer
Bruce A. Bassett
Sara Hooker
VLM
29
4
0
29 May 2024
DUB: Discrete Unit Back-translation for Speech Translation
DUB: Discrete Unit Back-translation for Speech Translation
Dong Zhang
Rong Ye
Tom Ko
Mingxuan Wang
Yaqian Zhou
11
22
0
19 May 2023
Escaping the sentence-level paradigm in machine translation
Escaping the sentence-level paradigm in machine translation
Matt Post
Marcin Junczys-Dowmunt
24
26
0
25 Apr 2023
Bilex Rx: Lexical Data Augmentation for Massively Multilingual Machine
  Translation
Bilex Rx: Lexical Data Augmentation for Massively Multilingual Machine Translation
Alex Jones
Isaac Caswell
Ishan Saxena
Orhan Firat
16
8
0
27 Mar 2023
MULTI3NLU++: A Multilingual, Multi-Intent, Multi-Domain Dataset for
  Natural Language Understanding in Task-Oriented Dialogue
MULTI3NLU++: A Multilingual, Multi-Intent, Multi-Domain Dataset for Natural Language Understanding in Task-Oriented Dialogue
Nikita Moghe
E. Razumovskaia
Liane Guillou
Ivan Vulić
Anna Korhonen
Alexandra Birch
19
13
0
20 Dec 2022
GanLM: Encoder-Decoder Pre-training with an Auxiliary Discriminator
GanLM: Encoder-Decoder Pre-training with an Auxiliary Discriminator
Jian Yang
Shuming Ma
Li Dong
Shaohan Huang
Haoyang Huang
Yuwei Yin
Dongdong Zhang
Liqun Yang
Furu Wei
Zhoujun Li
SyDa
AI4CE
27
25
0
20 Dec 2022
SpeechMatrix: A Large-Scale Mined Corpus of Multilingual
  Speech-to-Speech Translations
SpeechMatrix: A Large-Scale Mined Corpus of Multilingual Speech-to-Speech Translations
Paul-Ambroise Duquenne
Hongyu Gong
Ning Dong
Jingfei Du
Ann Lee
Vedanuj Goswani
Changhan Wang
J. Pino
Benoît Sagot
Holger Schwenk
21
34
0
08 Nov 2022
Leveraging Affirmative Interpretations from Negation Improves Natural
  Language Understanding
Leveraging Affirmative Interpretations from Negation Improves Natural Language Understanding
Md Mosharaf Hossain
Eduardo Blanco
22
4
0
26 Oct 2022
Graphemic Normalization of the Perso-Arabic Script
Graphemic Normalization of the Perso-Arabic Script
R. Doctor
Alexander Gutkin
Cibu Johny
Brian Roark
R. Sproat
33
4
0
21 Oct 2022
Separating Grains from the Chaff: Using Data Filtering to Improve
  Multilingual Translation for Low-Resourced African Languages
Separating Grains from the Chaff: Using Data Filtering to Improve Multilingual Translation for Low-Resourced African Languages
Idris Abdulmumin
Michael Beukman
Jesujoba Oluwadara Alabi
Chris C. Emezue
Everlyn Asiko
...
Shamsuddeen Hassan Muhammad
Mofetoluwa Adeyemi
Oreen Yousuf
Sahib Singh
T. Gwadabe
21
6
0
19 Oct 2022
The first neural machine translation system for the Erzya language
The first neural machine translation system for the Erzya language
David Dale
52
7
0
19 Sep 2022
SAMU-XLSR: Semantically-Aligned Multimodal Utterance-level Cross-Lingual
  Speech Representation
SAMU-XLSR: Semantically-Aligned Multimodal Utterance-level Cross-Lingual Speech Representation
Sameer Khurana
Antoine Laurent
James R. Glass
19
36
0
17 May 2022
CoCoA-MT: A Dataset and Benchmark for Contrastive Controlled MT with
  Application to Formality
CoCoA-MT: A Dataset and Benchmark for Contrastive Controlled MT with Application to Formality
Maria Nadejde
Anna Currey
B. Hsu
Xing Niu
Marcello Federico
Georgiana Dinu
14
24
0
09 May 2022
DeepNet: Scaling Transformers to 1,000 Layers
DeepNet: Scaling Transformers to 1,000 Layers
Hongyu Wang
Shuming Ma
Li Dong
Shaohan Huang
Dongdong Zhang
Furu Wei
MoE
AI4CE
15
155
0
01 Mar 2022
Self-Supervised Knowledge Assimilation for Expert-Layman Text Style
  Transfer
Self-Supervised Knowledge Assimilation for Expert-Layman Text Style Transfer
Wenda Xu
Michael Stephen Saxon
Misha Sra
W. Wang
MedIm
11
12
0
06 Oct 2021
Survey of Low-Resource Machine Translation
Survey of Low-Resource Machine Translation
Barry Haddow
Rachel Bawden
Antonio Valerio Miceli Barone
Jindvrich Helcl
Alexandra Birch
AIMat
19
147
0
01 Sep 2021
Facebook AI WMT21 News Translation Task Submission
Facebook AI WMT21 News Translation Task Submission
C. Tran
Shruti Bhosale
James Cross
Philipp Koehn
Sergey Edunov
Angela Fan
VLM
134
80
0
06 Aug 2021
A Survey on Low-Resource Neural Machine Translation
A Survey on Low-Resource Neural Machine Translation
Rui Wang
Xu Tan
Renqian Luo
Tao Qin
Tie-Yan Liu
3DV
33
58
0
09 Jul 2021
Neural Machine Translation for Low-Resource Languages: A Survey
Neural Machine Translation for Low-Resource Languages: A Survey
Surangika Ranathunga
E. Lee
Marjana Prifti Skenduli
Ravi Shekhar
Mehreen Alam
Rishemjit Kaur
25
233
0
29 Jun 2021
Megatron-LM: Training Multi-Billion Parameter Language Models Using
  Model Parallelism
Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism
M. Shoeybi
M. Patwary
Raul Puri
P. LeGresley
Jared Casper
Bryan Catanzaro
MoE
243
1,791
0
17 Sep 2019
1