Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1906.08885
Cited By
Low-Resource Corpus Filtering using Multilingual Sentence Embeddings
20 June 2019
Vishrav Chaudhary
Y. Tang
Francisco Guzmán
Holger Schwenk
Philipp Koehn
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Low-Resource Corpus Filtering using Multilingual Sentence Embeddings"
11 / 11 papers shown
Title
Catch Me if You Search: When Contextual Web Search Results Affect the Detection of Hallucinations
Mahjabin Nahar
Eun-Ju Lee
Jin Won Park
Dongwon Lee
HILM
75
0
0
01 Apr 2025
A comparison of data filtering techniques for English-Polish LLM-based machine translation in the biomedical domain
Jorge del Pozo Lérida
Kamil Kojs
János Máté
Mikołaj Antoni Barański
Christian Hardmeier
42
0
0
27 Jan 2025
Adapters for Altering LLM Vocabularies: What Languages Benefit the Most?
HyoJung Han
Akiko Eriguchi
Haoran Xu
Hieu T. Hoang
Marine Carpuat
Huda Khayrallah
VLM
34
2
0
12 Oct 2024
Separating the Wheat from the Chaff with BREAD: An open-source benchmark and metrics to detect redundancy in text
Isaac Caswell
Lisa Wang
Isabel Papadimitriou
26
0
0
11 Nov 2023
There's no Data Like Better Data: Using QE Metrics for MT Data Filtering
Jan-Thorsten Peter
David Vilar
Daniel Deutsch
Mara Finkelstein
Juraj Juraska
Markus Freitag
14
16
0
09 Nov 2023
Improve Sentence Alignment by Divide-and-conquer
Wu Zhang
16
0
0
18 Jan 2022
Self-Supervised Knowledge Assimilation for Expert-Layman Text Style Transfer
Wenda Xu
Michael Stephen Saxon
Misha Sra
Luu Anh Tuan
MedIm
13
13
0
06 Oct 2021
Detecting Hallucinated Content in Conditional Neural Sequence Generation
Chunting Zhou
Graham Neubig
Jiatao Gu
Mona T. Diab
P. Guzmán
Luke Zettlemoyer
Marjan Ghazvininejad
HILM
39
195
0
05 Nov 2020
Not Low-Resource Anymore: Aligner Ensembling, Batch Filtering, and New Datasets for Bengali-English Machine Translation
Tahmid Hasan
Abhik Bhattacharjee
Kazi Samin Mubasshir
Masum Hasan
Madhusudan Basak
M. Rahman
Rifat Shahriyar
VLM
17
72
0
20 Sep 2020
Cross-lingual Retrieval for Iterative Self-Supervised Training
C. Tran
Y. Tang
Xian Li
Jiatao Gu
RALM
28
72
0
16 Jun 2020
WikiMatrix: Mining 135M Parallel Sentences in 1620 Language Pairs from Wikipedia
Holger Schwenk
Vishrav Chaudhary
Shuo Sun
Hongyu Gong
Francisco Guzmán
CVBM
24
400
0
10 Jul 2019
1