ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1710.01025
76
13
v1v2v3 (latest)

MMCR4NLP: Multilingual Multiway Corpora Repository for Natural Language Processing

3 October 2017
Raj Dabre
Sadao Kurohashi
ArXiv (abs)PDFHTML
Abstract

Multilinguality is gradually becoming ubiquitous in the sense that more and more researchers have successfully shown that using additional languages help improve the results in many Natural Language Processing tasks. Multilingual Multiway Corpora (MMC) contain the same sentence in multiple languages. Such corpora have been primarily used for Multi-Source and Pivot Language Machine Translation but are also useful for developing multilingual sequence taggers by transfer learning. Since there is no official MMC collection researchers tend to use their own datasets which makes it difficult to compare various methods. As such we present our work on creating a unified repository of MMC spanning a large number of languages. We hope that this will help speed up the pace of multilingual NLP research and ensure that NLP researchers obtain results that are more trustable since they can be compared easily. We indicate corpora sources, extraction procedures if any and relevant statistics. We also make our collection public for research purposes.

View on arXiv
Comments on this paper