Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1807.11906
Cited By
Effective Parallel Corpus Mining using Bilingual Sentence Embeddings
31 July 2018
Mandy Guo
Qinlan Shen
Yinfei Yang
Heming Ge
Daniel Matthew Cer
Gustavo Hernández Ábrego
K. Stevens
Noah Constant
Yun-hsuan Sung
B. Strope
R. Kurzweil
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Effective Parallel Corpus Mining using Bilingual Sentence Embeddings"
50 / 64 papers shown
Title
MEXMA: Token-level objectives improve sentence representations
Joao Maria Janeiro
Benjamin Piwowarski
Patrick Gallinari
Loïc Barrault
26
1
0
19 Sep 2024
Learning Job Title Representation from Job Description Aggregation Network
Napat Laosaengpha
Thanit Tativannarat
Chawan Piansaddhayanon
Attapol Rutherford
E. Chuangsuwanich
27
1
0
12 Jun 2024
MINERS: Multilingual Language Models as Semantic Retrievers
Genta Indra Winata
Ruochen Zhang
David Ifeoluwa Adelani
RALM
47
5
0
11 Jun 2024
Critical Learning Periods: Leveraging Early Training Dynamics for Efficient Data Pruning
E. Chimoto
Jay Gala
Orevaoghene Ahia
Julia Kreutzer
Bruce A. Bassett
Sara Hooker
VLM
39
4
0
29 May 2024
Enhancing Cross-lingual Sentence Embedding for Low-resource Languages with Word Alignment
Zhongtao Miao
Qiyu Wu
Kaiyan Zhao
Zilong Wu
Yoshimasa Tsuruoka
28
9
0
03 Apr 2024
Does Negative Sampling Matter? A Review with Insights into its Theory and Applications
Zhen Yang
Ming Ding
Tinglin Huang
Yukuo Cen
Junshuai Song
Bin Xu
Yuxiao Dong
Jie Tang
33
9
0
27 Feb 2024
EMMA-X: An EM-like Multilingual Pre-training Algorithm for Cross-lingual Representation Learning
Ping Guo
Xiangpeng Wei
Yue Hu
Baosong Yang
Dayiheng Liu
Fei Huang
Jun Xie
21
2
0
26 Oct 2023
The Effect of Alignment Objectives on Code-Switching Translation
Mohamed Anwar
16
1
0
10 Sep 2023
Learning Multilingual Sentence Representations with Cross-lingual Consistency Regularization
Pengzhi Gao
Liwen Zhang
Zhongjun He
Hua-Hong Wu
Haifeng Wang
27
5
0
12 Jun 2023
Dual-Alignment Pre-training for Cross-lingual Sentence Embedding
Ziheng Li
Shaohan Huang
Zi-qiang Zhang
Zhi-Hong Deng
Qiang Lou
Haizhen Huang
Jian Jiao
Furu Wei
Weiwei Deng
Qi Zhang
30
10
0
16 May 2023
LEALLA: Learning Lightweight Language-agnostic Sentence Embeddings with Knowledge Distillation
Zhuoyuan Mao
Tetsuji Nakagawa
FedML
19
19
0
16 Feb 2023
Beyond Contrastive Learning: A Variational Generative Model for Multilingual Retrieval
John Wieting
J. Clark
William W. Cohen
Graham Neubig
Taylor Berg-Kirkpatrick
21
6
0
21 Dec 2022
Very Low Resource Sentence Alignment: Luhya and Swahili
E. Chimoto
Bruce A. Bassett
CVBM
11
10
0
31 Oct 2022
Retrofitting Multilingual Sentence Embeddings with Abstract Meaning Representation
Deng Cai
Xin Li
Jackie Chun-Sing Ho
Lidong Bing
W. Lam
23
6
0
18 Oct 2022
EMS: Efficient and Effective Massively Multilingual Sentence Embedding Learning
Zhuoyuan Mao
Chenhui Chu
Sadao Kurohashi
40
1
0
31 May 2022
Bitext Mining Using Distilled Sentence Representations for Low-Resource Languages
Kevin Heffernan
Onur cCelebi
Holger Schwenk
25
53
0
25 May 2022
All You May Need for VQA are Image Captions
Soravit Changpinyo
Doron Kukliansky
Idan Szpektor
Xi Chen
Nan Ding
Radu Soricut
32
70
0
04 May 2022
How can NLP Help Revitalize Endangered Languages? A Case Study and Roadmap for the Cherokee Language
Shiyue Zhang
B. Frey
Mohit Bansal
14
36
0
25 Apr 2022
MuCoT: Multilingual Contrastive Training for Question-Answering in Low-resource Languages
Gokul Karthik Kumar
Abhishek Singh Gehlot
Sahal Shaji Mullappilly
Karthik Nandakumar
26
13
0
12 Apr 2022
Quality Controlled Paraphrase Generation
Elron Bandel
R. Aharonov
Michal Shmueli-Scheuer
Ilya Shnayderman
Noam Slonim
L. Ein-Dor
19
38
0
21 Mar 2022
USCORE: An Effective Approach to Fully Unsupervised Evaluation Metrics for Machine Translation
Jonas Belouadi
Steffen Eger
31
20
0
21 Feb 2022
Improve Sentence Alignment by Divide-and-conquer
Wu Zhang
16
0
0
18 Jan 2022
On Cross-Lingual Retrieval with Multilingual Text Encoders
Robert Litschko
Ivan Vulić
Simone Paolo Ponzetto
Goran Glavavs
11
38
0
21 Dec 2021
CrossSum: Beyond English-Centric Cross-Lingual Summarization for 1,500+ Language Pairs
Abhik Bhattacharjee
Tahmid Hasan
Wasi Uddin Ahmad
Yuan-Fang Li
Yong-Bin Kang
Rifat Shahriyar
RALM
ELM
40
37
0
16 Dec 2021
Improved Multilingual Language Model Pretraining for Social Media Text via Translation Pair Prediction
Shubhanshu Mishra
A. Haghighi
VLM
18
4
0
20 Oct 2021
Secoco: Self-Correcting Encoding for Neural Machine Translation
Tao Wang
Chengqi Zhao
Mingxuan Wang
Lei Li
Hang Li
Deyi Xiong
VLM
20
3
0
27 Aug 2021
Neural Machine Translation for Low-Resource Languages: A Survey
Surangika Ranathunga
E. Lee
Marjana Prifti Skenduli
Ravi Shekhar
Mehreen Alam
Rishemjit Kaur
27
235
0
29 Jun 2021
LAWDR: Language-Agnostic Weighted Document Representations from Pre-trained Models
Hongyu Gong
Vishrav Chaudhary
Yuqing Tang
Francisco Guzmán
19
3
0
07 Jun 2021
Lightweight Cross-Lingual Sentence Representation Learning
Zhuoyuan Mao
Prakhar Gupta
Pei Wang
Chenhui Chu
Martin Jaggi
Sadao Kurohashi
VLM
19
8
0
28 May 2021
Unsupervised Multilingual Sentence Embeddings for Parallel Corpus Mining
Ivana Kvapilíková
Mikel Artetxe
Gorka Labaka
Eneko Agirre
Ondrej Bojar
SSL
13
36
0
21 May 2021
Paraphrastic Representations at Scale
John Wieting
Kevin Gimpel
Graham Neubig
Taylor Berg-Kirkpatrick
19
19
0
30 Apr 2021
"Wikily" Supervised Neural Translation Tailored to Cross-Lingual Tasks
Mohammad Sadegh Rasooli
Chris Callison-Burch
Derry Wijaya
CLIP
15
5
0
16 Apr 2021
Low-Resource Machine Translation Training Curriculum Fit for Low-Resource Languages
Garry Kuwanto
Afra Feyza Akyürek
Isidora Chara Tourni
Siyang Li
Alex Jones
Derry Wijaya
19
5
0
24 Mar 2021
Retrieve Fast, Rerank Smart: Cooperative and Joint Approaches for Improved Cross-Modal Retrieval
Gregor Geigle
Jonas Pfeiffer
Nils Reimers
Ivan Vulić
Iryna Gurevych
27
59
0
22 Mar 2021
Evaluating Multilingual Text Encoders for Unsupervised Cross-Lingual Retrieval
Robert Litschko
Ivan Vulić
Simone Paolo Ponzetto
Goran Glavavs
13
23
0
21 Jan 2021
Bilingual Lexicon Induction via Unsupervised Bitext Construction and Word Alignment
Freda Shi
Luke Zettlemoyer
Sida I. Wang
SSL
24
32
0
01 Jan 2021
Neural Passage Retrieval with Improved Negative Contrast
Jing Lu
Gustavo Hernández Ábrego
Ji Ma
Jianmo Ni
Yinfei Yang
21
25
0
23 Oct 2020
Unsupervised Bitext Mining and Translation via Self-trained Contextual Embeddings
Phillip Keung
Julian Salazar
Y. Lu
Noah A. Smith
SSL
25
25
0
15 Oct 2020
Semantic Label Smoothing for Sequence to Sequence Problems
Michal Lukasik
Himanshu Jain
A. Menon
Seungyeon Kim
Srinadh Bhojanapalli
Felix X. Yu
Sanjiv Kumar
AI4TS
17
18
0
15 Oct 2020
ComStreamClust: a communicative multi-agent approach to text clustering in streaming data
Ali Najafi
Araz Gholipour-Shilabin
Rahim Dehkharghani
Ali Mohammadpur-Fard
M. Asgari-Chenaghlu
16
1
0
11 Oct 2020
Multilevel Text Alignment with Cross-Document Attention
Xuhui Zhou
Nikolaos Pappas
Noah A. Smith
30
18
0
03 Oct 2020
Neural Retrieval for Question Answering with Cross-Attention Supervised Data Augmentation
Yinfei Yang
Ning Jin
Kuo Lin
Mandy Guo
Daniel Matthew Cer
21
31
0
29 Sep 2020
Language-agnostic BERT Sentence Embedding
Fangxiaoyu Feng
Yinfei Yang
Daniel Matthew Cer
N. Arivazhagan
Wei Wang
19
869
0
03 Jul 2020
Cross-lingual Retrieval for Iterative Self-Supervised Training
C. Tran
Y. Tang
Xian Li
Jiatao Gu
RALM
28
72
0
16 Jun 2020
MultiReQA: A Cross-Domain Evaluation for Retrieval Question Answering Models
Mandy Guo
Yinfei Yang
Daniel Matthew Cer
Qinlan Shen
Noah Constant
LRM
30
46
0
05 May 2020
Exploiting Sentence Order in Document Alignment
Brian Thompson
Philipp Koehn
19
19
0
30 Apr 2020
Making Monolingual Sentence Embeddings Multilingual using Knowledge Distillation
Nils Reimers
Iryna Gurevych
25
994
0
21 Apr 2020
Contextual Lensing of Universal Sentence Representations
J. Kiros
13
5
0
20 Feb 2020
Massively Multilingual Document Alignment with Cross-lingual Sentence-Mover's Distance
Ahmed El-Kishky
Francisco Guzmán
16
15
0
31 Jan 2020
CCMatrix: Mining Billions of High-Quality Parallel Sentences on the WEB
Holger Schwenk
Guillaume Wenzek
Sergey Edunov
Edouard Grave
Armand Joulin
25
254
0
10 Nov 2019
1
2
Next